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Eleven thousand persons have suffered death rather than submit to break eggs at the smaller end. 
Compiled for a dozen reasons many dozens of times, the last of which was on February 3, 2024. 


€ 0, $0 USD, 0 SEK, $50 CAD, 6 x 1029 Pengó, 100 JPC. Mieux vaut vivre avec des remords qu'avec des regrets. 


Legal Note: Did you personally sign a copy of the Official Secrets Act of 1939 before receiving this 
document? If not, then there's probably no law against sharing it. 


Reprints: Bitrot will burn libraries with merciless indignity that even Pets Dot Com didn't deserve. Please 
mirror—don’t merely link!—pocorgtfo22.pdf and our other issues far and wide, so our articles can help 
fight the coming flame deluge. We like the following mirrors. 

https://unpack.debug.su/pocorgtfo/ https://pocorgtfo.hacke.rs/ 

https://www.alchemistowl.org/pocorgtfo/ https://www.sultanik.com/pocorgtfo/ 

git clone https://github.com/angea/pocorgtfo 


Technical Note: The electronic edition of this magazine is valid as both PDF and ZIP. Thanks to Ange 
Albertini, it is also a polymock with many bogus file type signatures — check page 11. 


Printing Instructions: Pirate print runs of this journal are most welcome! PoC||GTFO is to be printed 
duplex, then folded and stapled in the center. Print on A3 paper in Europe and Tabloid (11” x 17") paper 
in Samland, then fold to get a booklet in A4 or Letter size. Secret volcano labs in Canada may use P3 
(280 mm x 430 mm) if they like, folded to make P4. The outermost sheet with pages 1, 2, 67 and 68 should 
be on thicker paper to form a cover. 


# This 15 how to convert an issue for duplex printing. 
sudo apt install texlive-extra-utils 
pdfbook2 --short-edge --paper=a3paper --no-crop pocorgtfo22.pdf 


AVE PEE LA > 
Man of The Book Manul Laphroaig 
Scooby Bus Driver Ryan Speers 
Editor of Last Resort Melilot 
'TEXnician Evan Sultanik 
Editorial Whipping Boy Jacob Torrey 
Funkmaster of File Formats Ange Albertini 
Assistant Scenic Designer Philippe Teuwen 
Stunts (Uncredited) Alexei Bulazel 
with the good assistance of 
Tree Killer EVM 


22:01 Need something good to read, my good neighbor? 


Neighbors, please join me in reading this twenty- 
third release of the International Journal of Proof 
of Concept or Get the Fuck Out, a friendly little 
collection of articles for ladies and gentlemen of dis- 
tinguished ability and taste in the field of reverse 
engineering and the study of weird machines. This 
release is a gift to our fine friends in Washington, 
D.C. 


BALDWIN. 


ANNOUNCES ANOTHER NEW ADDITION 
TO ITS GROWING FAMILY OF PHOTO- 
ELECTRIC SHAFT POSITION ENCODERS 


SERIAL, NATURAL BINARY 


NATURAL BINARY, SERIAL OUTPUTS 


The Serial Translator is a plug-in unit for the standard 
200-Series Encoder accepting an 11 or 12-bit parallel 
word at the input and converting it to a serial natural 
binary word which can be transmitted over one line. 


The binary bits are 10 microseconds wide and 10 micro- 
seconds apart. The amplitude of the pulses is 4.5V open 
circuit, 2.3V into a 50 ohm load. Outputs can be in either 
polarity. The serial output is especially desirable for long 
cable runs, telemetering links, etc. 


Interrogation for the serial translator can be furnished 
by a free-running or externally driven clock. The word 
rate of the free-running clock is set at a nominal 1 
K.C. (1000 word/sec.). The maximum word rate for the 
externally driven clock is 4 K.C. (4000 word/sec.). 


The encoder and translator combination is 2.6” dia. 
and 5%” long. Clock housing is 2-5/6” x 54" x 44". 


Model 222 Encoder (1 I-bit) $450.00 
Serial translator $248.00 
Clock $269.00 
(lots of | through 10) 
P.S. — Our family of standard encoders now numbers 
over 80, and we are proud of every one of them. 


BALDWIN izrzoernowics. Nc. 


1101 McALMONT STREET • P.O. BOX 627 • CODE 501-375-7351 
LITTLE ROCK, ARKANSAS 72203 


If you are missing the first twenty two releases, 
we suggest asking a neighbor who picked up a copy 
of the first in Vegas, the second in Sao Paulo, the 
third in Hamburg, the fourth in Heidelberg, the fifth 
in Montréal, the sixth in Las Vegas, the seventh 
from his parents’ inkjet printer during the Thanks- 
giving holiday, the eighth in Heidelberg, the ninth in 
Montréal, the tenth in Novi Sad or Stockholm, the 
eleventh in Washington D.C., the twelfth in Heidel- 
berg, the thirteenth in Montréal, the fourteenth in 
Sao Paulo, San Diego, or Budapest, the fifteenth 
in Canberra, Heidelberg, or Miami, the sixteenth 
release in Montréal, New York, or Las Vegas, the 
seventeenth release in Sao Paulo or Budapest, the 
eighteenth release in Leipzig or Washington, D.C., 
the nineteenth in Montréal, the twentieth in Heidel- 
berg, Knoxville, Canberra, Baltimore, or Raleigh, 
the twenty-first in Leipzig or Washington, D.C., or 
the twenty-second in D.C. Three collected volumes 
are available from No Starch Press, wherever fine 
books are sold. 


On page 5, Travis Goodspeed shares his tools 
for reverse engineering a photograph of a mask ROM 
into a ASCII art bitstream, and then converting that 
physically ordered bitstream into logically ordered 
bytes that might work in a disassembler or emula- 
tor. If you need to reverse engineer microcontroller 
firmware from before flash memory became cheap 
and plentiful, this is the tool for you. 


Ange Albertini wrote PoC||GTFO 7:6, the clas- 
sic article on abusing file formats with polyglots. On 
page 11, he presents a follow-up with better classifi- 
cations and the idea of “polymocks,” which are not 
polyglots but easily confuse libmagic and its friends 
into believing that file is valid in dozens of formats. 


Eighty years ago, C.S. Lewis published the 
Screwtape Letters, a classic of apologetics presented 
as letters from a senior demon named Screwtape to 
his junior nephew, Wormwood. On page 17, Pas- 
tor Laphroaig shares with us a more recent set of 
mis-delivered letters, in which Wormwood—now a 
senior demon—writes to his young nephew Malört 
about modern video clips, computer programming 
and how hard it is for a concerned demon to earn 
the wages of sin. 


On page 19, Ange presents a series of tricks 
building up to generic, reusable hash collisions for 
tarballs and zipped XML files, such as .docx files. 


We make this airmover unit, 


Vaneaxial gas bearing fan, ultra-high reliability, 
used in Minuteman. Only IMC produces it. 


and this unit, 


The Boxer®, standard sized, distributor-stocked. 
Rugged metal frame, our own impeller and 
motor, moves air economically and reliably. 


А е 
and this unit, 
Tubeaxial, the IMCool is distributor stocked in 


10 types, inputs of 60, 400, and 1000 cps, air 
delivery from 18 to 108 cfm. 


5 5 
and this unit, 
Rackmounted double blower. This one installed 


right in the broadcast area, delivers 500 quiet 
cfm to a high-power broadcast transmitter. 


but this unit 
will design and build an airmover 


to fit your specifications. 


IMC is the single-source 
specialist for airmovers. 
We design and build the 
entire system — motor, 
blade, bearing, housing— 
and can assume full re- 
sponsibility in meeting your 
needs. For quick response, 
contact Sales Dept., East- 
ern Division, Telephone 


Plants: Westbury, N. Y.; 
Maywood, Calif.; Roches- 
ter, N.H.; and Tempe, Ariz. 
Products: Airmovers; in- 
duction, hysteresis, syn- 
chronous, torque, servo, 
and stepper motors; syn- 
chros; solenoids; pressure 
switches; hydraulic and 
pneumatic valves; and fuel 


(516) 334- atomizers. For 
7070, or тих WT ey more data, 
516 333-3319. write: 


IMC Magnetics Corp. Marketing Division, 
570 Main Street, Westbury, N.Y. 11591. 


Windows, LLVM and Grsecurity all have control 
flow integrity schemes that can restrict the targets of 
indirect calls, such as function pointers. Aleksandar 
Nikolic has been playing with the eXtended Flow 
Guard scheme from Windows 11, using the hashed 
integrity markers as a means of reverse engineering 
the calling conventions of functions. What began 
as a mitigation against memory corruption exploits 
has become an oracle for reverse engineering! 

Stefan КӧЈЫ and Ange Albertini have been play- 
ing around with CTR mode, coming up with near- 
polyglots that have a different meaning and file for- 
mat for each of a few different key/nonce pairs. Page 
30. 

A long time ago in an evil empire far away, the 
Soviet Union’s consumer electronics monopoly pro- 
duced a pocket calculator, the Электроника MK- 
51. This looks exactly like Casio’s fx-2500, and on 
page 32, Travis Goodspeed deconstructs both cal- 
culators to show that the MK-51 counterfeits not 
just the look and feel of the Casio, but also its NEC 
microcontroller and every last bit of mask ROM. 

We’ve recently been including tourist guides to 
new computer architectures, and this release is no 
exception. Christopher Hewitt and Niccoló Izzo de- 
scribe the M16C and R8C series of microcontrollers 
from Renesas on page 39, beginning with the basics 
and working their way up to a fault injection attack. 
EVM can't let them have all the fun, so page 46 
presents his guide to the Elbrus 2000 architecture, 
Russia's domestically designed VLIW architecture 
with register windowing. 

Harvey Phillips shares on page 52 his Janus poly- 
glot from the Binary Golf Grand Prix. It’s valid as 
an x86 bootloader, ELF, COM, RAR, and a GNU 
Multiboot2 image, but also as program for the Com- 
modore 64! To keep the size to a minimum, many 
of these formats have useful sections overlapping. 

On page 68, we pass the collection plate, not 
for bitcoins or wooden nickels, but for nifty stories. 
What fine stories do you have, left untold except at 
your local pub? With what clever tricks might you 
grace our readers? 


22:02 


Lately Гуе been writing a book on extracting 
firmware from locked microcontrollers; rather, try- 
ing to write that book, because I fell into a rabbit 
hole of mask ROM reverse engineering. So for a few 
months, instead of writing prose, I wrote a tool in 
C++ for extracting bits from ROM photographs and 
a matching tool to decode those bits into logically or- 
dered bytes, suitable for disassembly or emulation.! 

Let's begin with a little background: SRAM, 
DRAM, Flash ROM, and EPROM memory tech- 
nologies hold bits invisibly as some form of electri- 
cal charge. Mask ROM is different, in that bits are 
written into one of the lithography masks that pro- 
duce the chip. This is very expensive per unique 
program, but very cheap per chip. 

Many chips include a nice, orderly grid of bits 
that contain code or data. Sometimes this is en- 
coded in metal vias, which you can see from the 
surface of a decapsulated chip. Sometimes bits are 
in the diffusion layer, and you need to remove the 
upper layers of the chip with hydrofluoric acid to ex- 
pose them. And sometimes bits are implanted into 
the doping difference between P and N silicon, re- 
quiring a procedure called a “Dash Etch” to stain a 
color difference into the bits after exposing them. 
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A Mask ROM Bit Extraction Tool 


by Travis Goodspeed 


Whatever the chemical procedure, the end re- 
sult from the lab is a panorama photograph of the 
ROM under high magnification, with bits visible to 
the naked eye. Like the ones on page 6, you will 
see that some bits are bright while others are dark. 
These are our ones and zeroes! 

If you are more patient than I, you might type 
these in manually, reading the ones and zeroes off 
the page in the same way that as children we typed 
BASIC program listings out of magazines. Instead, 
it’s nice to let a computer do that work, with a hu- 
man providing a minimum amount of guidance. 


Prior Work 


The first of these tools to be published was Rompar 
by Adam Laurie in 2013.? It’s a GUI application 
in Python, in which the operator draws a grid to 
mark the bit positions. OpenCV takes care of a bit 
of image preprocessing, to make the bits stand out 
by tossing away unneeded color channels. 

Published later in 2019, but perhaps written ear- 
lier, is Chris Gerlinsky's Bitract.? The user first 
loads an image and then describes an “area of inter- 
est,” a box containing so many rows and columns of 
bit positions. 

'These tools certainly work, but they have some 
problems that frustrated me enough to write some- 
thing new. 

Bitract requires a commercial image processing 
library to compile in Borland C++. As a Windows 
program, it ignores command line parameters and 
has no CLI. As a Python script, Rompar makes too 
much use of command-line parameters, requiring the 
row and column count of each bit grouping to be de- 
fined before startup rather than worked out on the 
fly. 

Both Rompar and Bitract expect bits to be 
perfectly ordered in a grid, which is great for re- 
ducing the operator's labor, but difficult on very 
large projects, where camera or stitching distortions 
might move something just barely out of the grid. 
It's also inconvenient on some 4-bit microcontrollers, 
where the final group of bits sometimes has fewer 
columns than the others. 


lgit clone https://github.com/travisgoodspeed/maskromtool 


?git clone https://github.com/AdamLaurie/rompar 


3git clone https://github.com/SiliconAnalysis/bitract 
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Figure 1: Font ROM from a TMP47C434N 


Bitract uses the mouse wheel to zoom, which is 
infuriating on a multitouch pad that ought to be 
able to both zoom and pan. Rompar, by contrast, 
traps the user at the native resolution of the image. 

I write these things not to criticize their work. 
'These were damned handy tools for their day. I just 
think that things could be more convenient. 


A Fresh Start 


So I began from scratch, with a design on a paper. 

For starters, I decided to support both a GUI 
and a CLI. I wrote the GUI in Qt6, for portability 
to Linux, Windows and macOS. The CLI is handy 
for regression testing and scripting, but it is strictly 
optional. All important features are available with- 
out it. 

Like Rompar, I chose JSON as a save format. 
Like Bitract, my tool shows handy histograms to 
quickly choose the best bit threshold. 

I decided to avoid having a strict grid of bits, 
but instead to use other objects to generate bit po- 
sitions. In my early drafts, this was done by drawing 
row and column lines, then identifying their crossing 
points as bit locations. Because only the bit loca- 
tions really matter, there's plenty of time to add 
support for marking grids later. 

Since this is à CAD program at heart, I took a 
few lessons from the PCB layout tools that I reg- 
ularly cuss at. Design Rule Checks (DRCs) were 
written early, implementing such features as iden- 
tifying overlapping bits, ensuring that each row is 
the same length, and sanity checking the design in 
other ways. Each DRC violation has a position in 
the project view, appearing as a yellow box beneath 
the bits but above the photograph. DRC violations 
can also be used for providing other feedback; the 
list isn't necessarily restricted to errors. 


DRC Rule Violations 


Ambiguous bit 0,1565 


Settings 


Sampling Display 


Bit Histogram 


63.8 127.5 191.3 255.0 


Determining a Bit's Value 


Knowing the position of a bit, how do we determine 
whether it's a one or a zero? The short answer is 
that we can look at how bright or dark it is, but 
there are some complications to consider. 

'The first is the color space. Often the bits are 
distinct in one color channel but absolutely indistin- 
guishable in another. And once we know the right 
channel, we must select the right threshold to dis- 
tinguish them. 

I found that by drawing a histogram of the num- 
ber of bits of a given color value, I could quickly 
see a bimodal distribution in some channel between 
ones and zeroes. Sliding a channel threshold auto- 
matically updates all visible bits, as well as updating 
a marker in the histogram to visualize where your 
threshold is set. 

I don't use OpenCV or similar libraries to pre- 
process my images. Rather, I’ve found that most 
implant and contact ROMs consistently have a color 
difference that can be found on a single pixel when 
working with losslessly compressed images. 

Diffusion ROMs are a little different, in that they 
are low in the chip, and when the chip has been pro- 
cessed a little too long, there's no color difference in 
the bit's center. Rather, the bit has a dark border. 
For these and other edge cases, my tool abstracts 


away the measurement as a class that returns an 
RGB triplet. To support this edge case, I simply 
wrote classes that measured a thin horizontal or ver- 
tical strip of pixels, returning the darkest point in 
each color channel along that strip. 


Aligning Bits Into Rows 


After the user draws lines for all the rows and 
columns of the ROM, we take those intersecting 
points and produce a set of bits positions. Because 
we don't have a definite grid, it's necessary to align 
these bit objects into rows. 

Initially I solved this problem very inefficiently, 
implementing a function to find the next-to-the- 
right of any bit by restricting its angle and marking 
all bits Га already passed. This worked great for 
small ROMs, but it scaled horribly, and by one hun- 
dred kilobits it was taking twenty minutes to align 
the bits! 

To come up with a faster algorithm, I realized 
that sorting all of the bits by their X coordinate 
would almost group them into columns. The ex- 
ceptions come from the image's tilt. Sometimes the 
leftmost bits of the second column are to the left of 
the rightmost bits of the first column. 

We can therefore identify the leftmost bits by 
following the sorted list. Whenever the Y gap is 
small, say less than a few times the average gap, 
we're still on the first column and we've identified 
another row header. If the Y gap is large, we're see- 
ing a bit from the second column, and we ought to 
pass it by. When we start to see many large gaps, 
we've passed the first column entirely and know all 
the row header bits. 


We've finally released our industry-standard data interfaces for use with the Timex-Sinclair 
2000, giving you so vast an array of computer peripheral choices that even trying to make up 
your mind may push you right over the edge! 


En PARALLEL INTERFACE "55s: $6995 
The amazing Centronics Parallel Interface lets you connect your TS2000 to just about any dot 


matrix printer and most other parallel devices. You'll have the option to use the printer's standard 
font or the TS2000 display font! What's more, we'll throw in printer driver software that supports| 
LPRINT and LLIST absolutely free! 


RS232-C SERIAL INTERFACE """5 ш $8995 

Ever fantasized about connecting a letter-quality printer to your T2000? What about a modem? 
With the spectacular RS232-C Serial Interface you can do both — at the same time! That's right, 
we give you two channels of RS232-C power on one board! Bit transfer rate is adjustable from 
300 to an incredible 19200 baud. As if that weren't enough, you get free driver software that sup- 
ports LPRINT and LLIST! 


Send your check or money order to: 


Box 18093, Austin, TX 78760 
ARREO 52:0 


4git clone https://github.com/JohnDMcMaster/zorrom 


So to align the bits, I first build an array of the 
rightmost bits of each column that I’ve yet passed. 
'This array is seeded with the row header bits at the 
far left. I then walk through the sorted list of all re- 
maining bits, overwriting their nearest row element 
in the array after updating the old bit's nextto- 
right pointer to aim at the new bit. 

This is lightning fast, reliably arranging hun- 
dreds of thousands of bits in the blink of an eye. 


From Physical Bits to Logical Bytes 


By this point in the article, you should understand 
how you might use MaskRomTool to mark the bits 
of a photograph and arrange them into a table of bit 
values. You also understand how the DRC mecha- 
nism might flag bits which are too near the threshold 
between a one and a zero. But there's a very im- 
portant piece we haven't yet covered: How does the 
software convert this table of physically-ordered bits 
into logically-ordered bytes? 

Let's begin with the prior art. John McMaster's 
Zorrom tool is built as a set of Python scripts with 
libraries for CH340, LC5800, LR35902, MCS48, 
PIC1670, and some TMS320 chips.^ For those chips 
that it doesn't directly support, it has a solver fea- 
ture that will attempt many permutations of decod- 
ing until the bytes match a defined pattern, such 
as setting the stack pointer in the first instruction. 
John's solver works for roughly half of targets, and 
it’s far easier than manually guessing permutations. 
This was the tool that I used until I recently wrote 
my own decoder. 

Chris Gerlinsky's BitViewer uses the a to- 
tally different strategy.” ^ Rather than automat- 
ically searching permutations, it instead graphi- 
cally displays the bits with adjustable grouping into 
columns. This helps à human operator explore 
the layout, while overdosing on caffeine in a hyper- 
focused fugue until eventually the bits make sense. 
This understanding doesn’t come easily, but I and 
others have done it. 

I wanted the best of both these worlds. From 
Zorrom, I wanted a CLI tool that could quickly pro- 
cess my projects, driven by a Makefile to rerun them 
in order to catch regressions in my decoder or im- 
ages. I also desperately needed a good search fea- 
ture, and Zorrom was the only example of such a 
thing when I started. And from BitViewer, I wanted 


5git clone https://github.com/SiliconAnalysis/bitviewer 


111110111111100001111111111111111111111111111111 
111110111111100011111111111111111111111111111111 
111110111111100011111111111111111111111111111111 
111110111111100011111111111111111111111111111111 
111110111111100011111111111111111111111111111111 
111110111111100011111111111111111111111111111111 
111110111111100011111111111111111111111111111111 
111110111111100011111111111111111111111111111111 
111110111111100011001111000011011101110111111111 
111110111111100001011100001111011101101111111111 
111110110111100011110101000111001100011111111111 
111110110111100000001100001111011110101111111111 
111110111111100001010101000111001100101111111111 
111110111111100010100100001111001110011111111111 
111110111110100001010100001111001100101111111111 
111110111111100011111000001001011111110011111111 
111100111111010011110101110111101110010111111011 
111100110111010010100111111111011010110011111110 
111100111111010001001111011011110011101110010110 
111100111111010011111111111111110001111101110010 
111100110111010011111110111111110011111110000110 
111100111111010001011111101011110011101100110010 
111100111100010011111111111111100001111110010110 
111100111111010010100011111101101110010001011111 
101100111111010001010101110110100010000110000000 
111100111111010010100111111101010010110000000011 
111100111111010011111110111101111111111111101111 
111100111111010011111111101001111111111111101111 
111100111111010011001111011001111111111111101111 
101100111111010011111111111101111101111111101111 
011100110100010011111111111101101101111111101111 
111100111111010000000011111100100010000000000010 
111000110110001001010000110100100010000110010000 
011000110110001010100011001010010010110010010101 
101000110010001011111011111110111111111111111101 
101000110010001011111011111110111111111111111101 
011000110010001011111011111110111111111111101101 
111000110010001011111011111110111111111111111101 
111000110100001011101011011010111101111101111001 
111000111111001000000011111100100000000000000000 
111000111111001001010101110100100010001100010000 
101000111111001010100111111011010010110010011111 
011000111111001011111110111111111111111101111111 
011000111111001011111111001111111111111111111111 
101000111111001011111111111111111111111101101011 
111000111111001011111111111111111101111111101101 
111000110100001011111111111111111101111111111111 
111000111111001000000011111100100010000000010010 
010010111111000111110101110101100010000110010010 
110010110111000110100111111011010010110010011111 
110010111111000101011111111111111111111111111011 
110010111111000111111111111111111101111111101111 
110010110111000111111110111111111111111111101111 
010010111111000101011111001111111111111111111111 
100010111100000111111111111111111101111111111111 
110010111111000110100011111111100010000000010010 
110010111111000110101111000011011100100011111011 
110010111111000100001100000011001100000010001110 
110010110111000110100101000001000000000010010110 
110010110111000100000100000011000010000010011110 
110010111111000100001101000001000100000010101110 
110010111111000110100100000011000010000010010010 
110010111100000100001100000011000100000010101110 
110010111111000110101000000011011110100001011111 


address 0004 78 84 84 84 84 84 84 78 10 30 10 10 10 10 10 38 


78 84 04 08 10 20 40 FC 78 84 04 18 04 04 84 78 
18 28 48 88 88 FC 08 08 FC 80 80 F8 04 04 84 78 


010 
020 
030 
040 
050 
060 
070 
080 


data 


address 


78 84 80 F8 84 84 84 78 FC 04 04 08 10 20 20 20 


10 28 44 82 FE 82 82 82 FC 82 82 FC 82 82 82 FC 
FC 82 80 80 80 80 82 FC F8 84 82 82 82 82 84 F8 
FE 80 80 FC 80 80 80 FE FE 80 80 FC 80 80 80 80 
7C 82 80 80 8E 82 82 7C 82 82 82 FE 82 82 82 82 
38 10 10 10 10 10 10 38 3E 08 08 08 08 88 88 70 


78 84 84 78 84 84 84 78 78 84 84 7C 04 04 84 78 


090 


ОАО 82 84 88 FO 90 88 84 82 80 80 80 80 80 82 82 FE 
OBO 82 C6 AA 92 82 82 82 82 82 C2 A? 92 92 BA 86 82 
0С0 38 44 82 82 82 82 44 38 FC 82 82 FC 80 80 80 80 
ODO 38 44 82 82 82 8A 44 3A FC 82 82 FC 90 88 84 82 
0Е0 7C 82 40 38 04 02 82 7C FE 10 10 10 10 10 10 10 
OFO 82 82 82 82 82 82 82 7C 82 82 82 82 82 44 28 10 
100 92 92 92 92 AA 44 44 44 82 44 28 10 10 28 44 82 
110 82 44 28 10 10 10 10 10 FE 04 08 10 10 20 40 FE 

00 10 10 10 FE 10 10 10 00 00 00 00 FE 00 00 00 

00 00 10 00 00 10 00 00 00 10 92 54 38 54 92 10 

00 40 20 10 08 10 20 40 00 08 10 20 40 20 10 08 


120 
130 
140 


The starting address 


is 8 bytes. 


2011 


2012 


2013 


Figure 2: Extracted Bitstream and Datasheet Bytes of the 


Figure 2. Typical character 
Note. The data area for one character 


2014 
2015 
2016 
2017 


TMP47 Font ROM 


150 СО CO CO CO CO CO CO CO FO FO FO FO FO FO FO FO 


160 
170 


is the value entered at the 
character ROM address (00-2Ен) 


specified by the upper 6 bits of 


FC FC FC FC FC FC FC FC FF FF FF FF FF FF FF FF 


00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
Figure 3. Standard character data (from address 20004) 


the 9-bit program area (000- 


17F 1). 


some graphical connection to my project file, so that 
a nearly correct guess could be explored until the er- 
ror was found. 

My decoder is called GatoROM. It is used ei- 
ther as a CLI tool without the GUI overhead of 
MaskRomTool, or as a C++ library within the GUI. 


Since Zorrom was the gold standard of solving 
for unknown layouts, I began my decoder with the 
complete set of Zorrom permutations from an ar- 
bitrary bitstream, essentially exporting every case 
that it would investigate. Once I could match de- 
codings of all these, and also permute between all 
settings, I knew that my solver had feature parity 
with McMaster's. 

GatoROM uses its own class to represent a bit, 
different from that in MaskRomTool. The class 
holds values such as its address and bitmask during 
the most recent decoding, as well as a void pointer 
that might point to a matching bit in the GUI. 

All of the GatoROM bits begin in a table of their 
input positions, and transformations (flip, rotate, 
etc) produce a table of bits in the output order. This 
output table is then passed to the parser for decod- 
ing, when the address and bitmask fields of the bit 
class instances are updated to record their logical 
positions. Zorrom does these steps in roughly the 
same order, but by passing values instead of point- 
ers, it does not preserve relationships between the 
inputs and outputs. 

I wrote earlier that I also wanted something 
like BitViewer's interactive nature in my tool. By 
recording the address and mask of every bit that is 
decoded, my tools can easily show their work. It’s 
no trouble to select the first few bytes of a decoding, 
then ask the tool to highlight the bits of those bytes 
in physical order. 


Specifications 


+16 dBM into 600 Q 


*.25dB 15 Hz - 40 KHz 
+ 25 Mw into 40 Q headphones 


148 5Hz-100KHz 


< .01% T.H.D. 8—12 hours continuous орега- 
tion on internal 9 volt batteries 
Equivalent input noise < -127 ВМ depending on phantom load. 
An Excellent Mic Pre-Amp, High Efficiency Power Phantom Power 


Converter operating at 500 KHz supplies 
all the intemal power requirements. It 
has been designed for minimum noise 


the AERCO MP-2 is designed with what 
we believe to be the very best compo- 
nents available. 


5 implemented with the Industry 
standard 52 volt circuit for proper 
‘operation of 12 volt and 48 volt micro 
generation and maintains a conversion 
efficiency well in excess of 90%. Input 
power can range from 7 - 20 Volts D.C 
without regard to polarity and with no 
electrical connection to the audio 
circuit 


phone types. Individual jumpers for 
each channel disable the phantom 
voltage for use in unbalanced environ: 
ments, 


Jensen Input Transformers 
have no known rival, Exotic materials 
and fanatical devotion combine for 
34B points of 2 Hz and 250 KHz 


Great Quality - Small Price 


We've priced the 2 channel unit at $560 


High Gain... Low Gain 


eight independently selected gain 
settings for each channel from 20 to $0 
dB. Gains are set by a switch and 
network of precision resistors instead of 
a pot. This ensures the lowest possible 
noise and eliminates the reliability 
problems inherent in small pots. The 
Switches are accessed through the RCA 
jacks. 


Extremely Low Noise 

We selected the Linear Technology 

17-1028 amplifier on the basis of sonic 

clarity and super low noise perform. 

ance that will instantly bring a grin to 
ке. The facts that it drives «24 


to make you a fast friend. Then we 
plan to sell you more neat stuff. 


AERCO 
Box 18093 Austin, TX 78760 


put, and that it consumes just. 200 mW 
of power are more icing on the cake 
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Neighborly greetings to John McMaster for his 
ground breaking work on Zorrom, for helping me get 
my lab back together, and for patiently explaining 
all those things of semiconductor reverse engineer- 
ing that I had fallen behind on in the past years. 
And cheers to Vicki Pfau for being smart enough to 
decode the arrangement of the TMP47 font ROM, 
which will soon lead to a general decoder for TMP47 
program ROMs. 

My tools don’t yet solve every ROM that was 
ever manufactured, but I’m happy to say that they 
are now the best tools for any particular ROM ex- 
traction job. They are fast, they hardly ever crash, 
and they run reliably from the command line or from 
an OpenGL GUI, whichever you might prefer. 

Now that I’ve solved that problem, perhaps I can 
get back to finishing my book? 


Aiba ai 


Controls 1-4 drives 


3 inch to 8 inch drives 
Shugert compatible 


$199 FD-68 Interface 


$99 Drive 407/05/00 
5 inch/400 kilobyte 
1 
эйе шаа $99 Duel Drive Cebinet 
end 5 amp Pwr Pack 
$3 Per Item S&H 
Texas Residents add 5X 


VISA/MasterCard add 5X 


40/80 tracks per side 


64K RAM & ВК ROM 
on board 


RGB monítor output 


Enhance the performance of your TS 2068 with the AERCO Disc System. All of the 
speed and convienience of a full-out floppy disc system. Load programs at an incredibly 
fast 250,000 bits/sec. Fully compatible with all Shugart type drives, including those 
already in use with the AERCO 1000 Disc System. The 64K of on-board RAM can be 


used as a second bank of system memory or a soon to be released full-blown CP/M 
System (version 2.2). The RGB output is crystal clear and rock steady. The power 
supply is a S-amp high efficiency switcher. We offer a variety of other hardware for all 
models of SINCLAIR-TIMEX 


Floppy Disc Interface ..............-...... қ 
Disc Drives — 

Power 5иррИе$ ....... ҮРУ 
Centronics Printer 1/0............ 

Duel RS-232C Serial 1/0................... 
Direct Video Mod (DV-1) .. 

C ITOH 8510 Printer 

C ITOH 7500 Printer . 

ROM Bd. with Auto Disc Boot Ж 

RGB Cable (specify monitor) . 

CP/M (V. 2.2). 


AEREO | 


ACME ELECTRIC 88887 CO 


Box 18093 Austin TX 78760 


Ph (512) 451-5874 


22:03 Mitra and Mocky: Near-polyglots and Mocks 


Our readers are encouraged to read Abusing File 
Formats (PoC||GTFO 7:6). This is a follow-up with 
better classification, leading to the creation of tools 
to automate the generation of various kinds of weird 
files: mocks, polyglots and near-polyglots. We also 
share a tool, Mitra, to help apply these techniques. 

'The basic idea to abuse files is to make space for 
foreign data that will be ignored by parsers. This 
can rely on various features of the targeted file for- 
mat. 


Signature Some formats like MP4 and PostScript 
are parsed from offset zero, but they don’t enforce a 
magic at that offset. As a consequence, it’s possible 
to make a MP4 / PostScript polyglot by abusing the 
length of an MP4 atom to encode a PostScript dec- 
laration for a dummy function that will cover more 
MP4 structure. 

A polyglot MP4/PS header starting a dummy 
function but also an MP4 free block. 


/PostScript/ whitespace then line comment 
00:00 00 00 % f r е e \г \п X ГР S Nx \n 
\----MP4---\ Declares a comment of length 0x25 


/PostSc./ Declares a function name 
then a line comment 
10: / 4 ( % 00 00 00 00 00 00 00 00 00 00 00 00 
20:00 00 00 00 00 00 00 XX Е г е e\r\n ) +} 
\---MP4----\ Declares a comment len 


The same bytes look quite different when inter- 
preted as a PostScript file, where 00 00 00 XX de- 
clares the length of a free chunk covering the whole 
PostScript. 


\O\0\0%free 

“LIPS 

/ £(NONONONONONONONONONONONONONONONONONONON xXXfree 
>} 


Starting parsing at offset zero is only а half- 
measure against file-format abuse. Only when sig- 
natures are enforced at the very beginning will file 
format abuse and confusion become impossible. 

On the other hand, having a signature for each 
frame such as Ogg or ILDA can be superfluous if the 
frames are properly length-defined. 


65ее “The .xz File Format, v1.0.4.” 
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by Ange Albertini 


Cavities Some formats start with cavities, which 
are totally ignored. We used to call these empty 
spaces. 

A typical example is the ISO format, being a 
raw dump starting with empty sectors. This is com- 
monly used in IsoHybrid to make a single ISO that 
boots either as a thumbdrive or as a CD-ROM. 

Another example is the DICOM format, which 
by courtesy skips the first 128 bytes of the file. A 
magic signature and enforced structure are required 
after that range. This is particularly convenient for 
TIFF files, since unlike most picture formats, the 
eight-byte long Image File Header points to the File 
Directory, which points to the image data, so it’s 
easy to move the big chunks of structure and data 
around the DICOM ones. In short, a TIFF /DICOM 
polyglot is easy, and a TIFF/DICOM chimera— 
sharing the same image data—is even possible. 


Appended data Most parsers will ignore any 
further data once the file is considered “complete 
enough,” which is determined by whether some kind 
of recovery is required or not. 

It might be explicit via some kind of terminat- 
ing marker. Or it might not, when enough data 
is present or by intentionally triggering a parsing 
bug to end the parsing. An example of trigger- 
ing a recursion exhaustion warning can be found in 
PoC||GTFO 17’s AGC polyglot. 

Data added after the end—and ignored—is 
called appended data, but it’s rarely mentioned ex- 
plicitly in the specifications. It’s simply tolerated. 

Some formats actively prevent appended data by 
having a footer. A footer can be thought of as a spe- 
cific structure required to be at then the very end of 
the file such as ID3v1, which is a footer appended 
itself to a MP3/Layer3 stream, or the XZ archive 
format, which officially enforces a footer to define a 
file as complete.® 

Other formats like Dicom, Ogg, ILDA or 
PCAPING] enforce the whole file to follow a given 
structure: they're just pure sequences, sometimes 
even requiring a magic signature for each structure 
such as ILDA or Ogg. They just go on parsing ev- 
erything from the file, and returns an error since it's 
invalid. 


Metadata Since it's present in the file but not 
needed for parsing or rendering the file's contents, 
metadata is a great source of abuse. Many old school 
formats have fixed length fields, as hard coding was 
the norm back then. 


Comments are typically ignored empty space 
with typically a set length that is declared before, 
and are present in most file formats. Unlike PDF, 
XML enforces an encoding for its comments like the 
rest of the file, but in general, comments are just ig- 
nored, and preserved, no matter their amount, their 
length or their content. 

Comments are not the only source of abuse. Ex- 
tensible metadata with a user-chosen ID, or fields 
like file names in an archive can also be used to 
store some foreign data. A notable exception is that 
in Gzip, the optional comment and file name are 
null-terminated, which shows that they're intended 
to store standard text, while the also optional Extra 
Field is defined with a 2-byte length. 

As a side note, metadata may seem like a perma- 
nent risk entirely, and it’s natural to wonder why we 
define them officially in every format if they are so 
easily abused later. While metadata doesn't seem 
like an initial requirement to keep the format simple 
— like the Quite OK Image format — it is even- 
tually needed to be able to keep extra information 
in the file, which is exactly what happened for the 
МРЗ files. 

At the release of l3enc (the original Mpeg Layer 
3 encoder in 1994), the files initially had an 13 ex- 
tension, had no file format whatsoever. They were 
pure sequences of layer 3 frames, each with their own 
frame header with no signature, making them hard 
to detect and easy to confuse with other data such 
as JPEG segments. 

Since there was no way to store any metadata 
in L3 files, the compatible ID3v1 footer with a hard 
coded length was unofficially defined. More struc- 
tures were defined in other clumsy ways around the 
L3 stream (Xing, Lame, APE, ...), showing the 
need for proper definition of metadata storage from 
the beginning. ID3v2 eventually defined a header, 
a magic which gave at last a proper format to MP3 
streams. 

It's a shortsighted move to come up with a great 
compression algorithm (e.g., MP3, QOI) and define 
a way to store some data in a file format without 
the ability to extend it with new but optional data 
in the future. Even if it means that these structures 
can be abused, you can't have an extensible format 
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that can't be abused, and people will extend or fork 
your format if not, requiring an extra format that 
could have been avoided from the beginning. 


Wrappending Some formats don't tolerate ap- 
pended data, but they can end with a parasite- 
hosting structure, which acts like appended data, 
just wrapped in a declarative structure. Since most 
data-storing structures have all their declaration be- 
fore their data itself, it behaves like appended data 
from the outside, even if the length of the appended 
data has to be declared somewhere. It looks like ap- 
pended data — ignored but tolerated but it's techni- 
cally an appended parasite — declared and skipped. 


Zipper Some formats have very strong con- 
straints: a tiny cavity, or very small parasite length 
(256 bytes for a GIF). Rather than parasitizing a 
whole file, let's just add the declaration of the file, 
a declaration of a comment, and then store the rest 
of the file as appended data. 

A zipper is a fabric construct of two sliders where 
each tooth interlocks with the other side's tooth. As 
an analogy, a zipper is a file construct where each 
format comments the other format's elements, and 
each tooth is a parasite for the other format. 

The simplest form of a zipper is made of two for- 
mats: Format A starts at offset zero, tolerates ap- 
pended or wrappended data, and Format B starts 
with a cavity. Both formats can be parasitized. 


Filen FileB 
гт 
Неаай . . 
ВойдчА 
ШОШ) HeadB 


BodyB 


SWTP 6800 OWNERS-WE HAVE A CASSETTE 1/0 FOR YOU! 


The CIS-30* allows you to record and playback data using an 
ordinary cassette recorder at 30, 60 or 120 Bytes/Sec.! No Hassle! 
Your terminal connects to the CIS-30* which plugs into either the 
Control (MP-C) or Serial (MP-S) Interface of your SWTP 6800 
Computer. The CIS-30* uses the self clocking ‘Kansas City'/Biphase 
Standard. The CIS-30* is the FASTEST, MOST RELIABLE CAS- 
SETTE 1/0 you can buy for your SWTP 6800 Computer. 


Kit — $69.95* 
Assembied — $89.95* 
(manual included) 

* plus 5% f/shipping 


PerCom has a Cassette 1/О for your computer! 
Call or Write for complete specifications 


d 


Р.О. Box 40598 + Garland, Texas 75042 + (214) 276-1968 


= ЕШ 
TEXAS RESIDENTS ADO $^ SALES TAX 


PerCom - ‘peripherals for personal computing’ 


We parasitize File A with Head B, adding 
padding if required. We also parasitize File B with 
Body A, wrappending Body B in advance if required. 

Рағай 


РағаВ 


Неайй 


When we merge these files, it looks like this. 


Zipper 


Headü 


Bodun 


Zippers combines various format features (cav- 
ity, parasite, appended data) to overcome limita- 
tions and make even more weird formats combine. 

Mitra is a tool that combines all this knowledge 
for 404- different format, generating hundreds of for- 
mat combinations with different strategies." 

Mitra is a simple tool. It doesn't understand 
file formats structure, it just contains the minimum 
amount of information to identify and parasitize a 
file format. It expects standard files as input! 


Abuses 


Payload embedding The simplest form of ex- 
ploitation is to just embed a payload that doesn't 
need to be a valid file. In this case, use the - - force 
command line parameter. 

The universal example for that is HTML or 
JavaScript that can be embedded in most file for- 
mats. If the file is too big, the HTML page might 
take too long to load entirely. In that case, use 
JavaScript to break out of the appended data and 
limit the parsing to the web payload only. 


‘git clone https://github.com/corkami/mitra 
Sunzip pocorgtfo22.pdf mocky.py 
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Mocks While 7:6 covered file type identification, 
it didn't cover any exploitation. The easiest way to 
exploit file type identification is just to give a binary 
blob the right signature at the right offset. It could 
even happen accidentally. Such a file is a mock file. 
A simple example is FF D8, a two byte file that can 
be identified as a JPEG image. 


Polymocks It’s also interesting to just add a 
mock signature at a given offset in a valid file via 
any of the previously mentioned techniques. 

Mocky is a tool that uses the Mitra library to in- 
sert specific filetype signatures at specific offsets to 
create polymocks.® Since programs like file have 
so many signatures that they are scanned by alpha- 
betic order of their category, it's possible to predict 
which detection will be returned first, and the order 
might not be what you'd expect from a threat model 
perspective. 

Of course, it's possible to cram as many signa- 
tures as possible within the constraint of the target 
format, such as this issue, a valid PDF with many 
extra mock signatures stored in a standard stream 
object. Here is an example of such a polymock file, 
with many filetype detections yet no valid content: 


xl аа el ct aa = scc 2+2 Ай 
22 M Z 60ЕП j P 0120/7) 19 04 00 10 5 МН 442 JPEG 2099 
6+2 UnicOS 
844 Symbian 
С+4 Sndh 
МВ OF 0. DC A7 C4 FD 50) 1С ЭЕ АЗ) (REE 1044 Nintendo Switch 
1444 Zoo 
18+4 Nintendo Wil 
1С+4 Rer vi.4 
ИН К 5 B 1028 6F 01 P К 0304 P Т МЕ 20+4 AFS 
2444 zimage 
28-4 PkZip 
2C+4 PolyTracker 
wag Y M E Хе 7 z BO AF 2715 ON б 30+6 SymbOs 
3646 7-zip 
3C+4 SoundFX 
ИЕ 10) ОА ВЕТ Ой 21 PK 0102 5 C R 5 4044 VirtBox 
46+2 Int 21h 
48+4 PkZip 
C+4 ScreamTracker 
Ж (Rec А ОДО) L R Z I РАНО: 50+8 Rar v5 
х ^ 0 4 a M 
+8 Ploi94 
x4 mg Ne 00 CAD 647 Ra va 
» а 6D+5 EZD Map 
x2 РООДО 0) 84 #274018) 03 21 4C 18 7246 Xz 
7844 174 
7C+4 124 
НРУ, 3 Mob J] 8044 DICOM 
z ee as 844C PDF 


Running file with --keepreading gives an im- 
pressive list of detected formats: 


Plot84 plotting file 

SymbOS executable v7.z, name: 
Old EZD Electron Density Map 
Zoo archive data, vj., modify: 
Symbian installation file 
Scream Tracker Sample adlib drum stereo... 
Poly Tracker PTM Module Title: "MZ... 
SoundFX Module sound file 

Nintendo Wii disc image: "NXSB... 

DICOM medical imaging data 

Linux kernel ARM boot executable... 
VirtualBox Disk Image, minor 8653 (MZ... 
JPEG 2000 image 

ARJ archive data 

COM executable for DOS 

unicos (cray) executable 

data 


v78.88-* 


Mocky has a - -combine flag for to try and insert 
as many signatures in a file as possible. 

file has an extra weakness that it has special 
support for tar files before any other format, and can 
identify tar files not by their magic, but by the valid- 
ity of their header checksum, even without any tar 
signature in the file. This is used in this issue too, 
so even if the file is a standard PDF starting with 
a generic PDF signature, file with no parameter 
sees it as a tar archive, even if it doesn't contain 
a magic Tar signature. 

Mocky will adjust the tar checksum if used for 
a polymock file. Here is such a empty mock.tar 
file, detected generically as tar archive even if it 
contains no signature at all: 


000: 00000000 00000000 00000000 00000000 
090: 00000000 400 00 00000000 
1F0: 00000000 00000000 00000000 00000000 


Adding a valid tar checksum to the previous 
polymock example will indeed return a tar filetype 
— ifthe --keep reading parameter isn't used — 
despite all the other present signatures. 


Near-polyglots Formats that require a different 
signature at the same offset can't be combined in a 
polyglot. However their combination can still be ex- 
ploited in different conditions. Near polyglots are 
files that are almost polyglots, except that some 
bytes have to be replaced so that the file type 
changes. 

This change could happen over the network if 
some packets arrive in a different order. It could 
also happen due to weak bits, leading to different 
contents. And it can happen via a cryptographic 
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operation in which case you can call them crypto- 
polyglots. 

One of these use case is Angecryption, intro- 
duced in PoC||GTFO 3:11, where I demonstrated 
abusing the Initialization Vector of CBC, CFB or 
OFB block modes to replace the first block of the 
crypto-polyglot. A new abuse of pseudo-polyglots is 
presented in the TimeCryption article on page 30 of 
this release. 

To generate a near polyglot, you need very 
light constraints. Format, can be parasitized and 
Formats can start at the same offset, and needs to 
tolerate appended or wrappended data. Technically, 
generating a near polyglot is like parasitizing a for- 
mat, ignoring that they both start at overlapping 
offsets, and keeping the head. 


File 


ЕЕ HeadB 


LL 


Just parasitize File, with Bodyg and keep 
Headg as the overlap: 
NearP 


Filen 


Неаай 


BodyA 


Overlap 


Неайй HeadB 


BodyA 


The minimum length overlap is basically the 
number of bytes where you can declare a file then 
have some unparsed space, either naturally or by 
declaring a comment. This value can change drasti- 
cally between file formats, as shown on page 15. 

For example, it's 1 for PostScript because you 
can declare a line comment with 96, so provided 
there's no encoded newline after decryption, this will 
be a valid ignored space. The PE file format’s min- 
imum overlap is two bytes — M Z — because you 
can abuse the DOS header, limiting you to a 58- 
byte parasite. 

A JPEG header with comment declaration is FF 
D8 FF FE XX YY, which is six bytes, with XXYY be- 
ing the length of the comment in big endian. How- 
ever, if you need a 0x3489-long comment, a 0х35??- 
long comment will do the trick, so you don't have to 
bruteforce the YY byte. If you feel luck, you might 


Variable  Unsupported 


offset parasite 
Minimal start offset 
1248 9 16 20 23 28 34 40 64 94 132 12 28 
12 26 32 36 68 112 226 16 
PPJFMTFWGPRIRBCIPCJPEAPIIJ WBO BEGLN 
S EP Ll PT LAZN I DT M-POL. SoA POL RC С Са APG Z B INE 
Ga4FVD GF3FPIDDBA2AF АОСУ SGG 2 МЕК 5 
с Е Е у OA P P a M L 
2 N 
G 
1* PS М AO 0*7 € €* ҒА ТТР Т YoWw Ww wo how Yo wu pmo». 2 ТЗ РР АИ АЙК 
20 PE M. A AA AAA AA AA AA AAA AAA ! ! ! ! ! ! M M калоо! 
4+ JPG AA. А А А А А А А А А А А А А А А А А А А А А А А AAA КАА А А 


[the table could go on but would take too long to bruteforce] 


X: automated 7T: likely possible 
M: manual !: unknown 


* Hack that relies on line comments with GhostScript. Requires the parasite not to contain any new line, 
after encryption. 


@ Hack relying on overwriting the DOS Header, therefore restricting the parasite space to offsets 2-60. 


* Signature, comment declaration and length are two bytes apiece. To specify them all is six bytes, but if 
we round up the big-endian length and leave its low byte uncontrolled, we only need five. And if we 
leave the length entirely uncontrolled, we only need to fix four bytes. 


Figure 3: Minimal start offsets of file formats, and exploitation via near polyglots 


The Inforex 180 0 Magnetic Line Printer. 
paper is ordinary. 


Our patented 2 e үне gives youa high quality printout on an ordinary, 
inexpensive 8%” roll of paper. But everything else about the 180 is 


definitely not ordinary. 
SMALL. Fase ha sh 
UIET. Desk top size, weighs RUE А 
Atleast 1046 below Ем pounds; TTL. Parallel TTL CHARACTER SET. 
electric typewriters. ` | Full96 characters 
/ ASCII, upper andlower 

Ideally suited forthe N, É ок 
office environment. 4 case. Expanded char- 


FAST. 

180 lines per 
minute. A 1920 
character screen 
in 8 seconds. 


APPLICATIONS. 


CRT hard copy, minicomputers, 
remote printing operations, OEM 
or end-user. 
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options. 


orth А: 


на "MÀ 01803 


acter sets available as 


GRAPHICS OPTION. 
Permits intermix of 

| text with bar charts, 
curves, etc. 


PEINFOREX| 


For more information, write 
Inforex, Incorporated, 
De t. 588 cD: 7, 


also gamble on the length and not bother to brute- 
force XX either. 

Mitra can generate such files with the --overlap 
parameter. It keeps the overlap's content in the file- 
name as well as the offsets where the content changes 
formats, to be re-used later by AngeCryption or 
TimeCryption scripts. 


Ambiguity Files with different interpretations 
depending on parsers we now call ambiguous 
and previously called schizophrenic, werewolves or 
shapeshifters. There are plenty of sources of this 
ambiguity. 

When a value such as a pointer never changes 
across standard files, it's tempting for a parser to 
simply ignore it. Putting some contents under un- 
usual conditions while putting other contents under 
the typical conditions might reveal a difference be- 
tween the two parsers. 

Sometimes a value is represented twice. For ex- 
ample, a buffer with a declared length might also 
end with a null terminator. What if that termina- 
tor happens earlier than the declared length? Which 
length value is the real one? Or if you declare the 
same value twice, and there isn't an error, does the 
first or the second declaration take priority? 


BIAX® MEMORIES NOW 102 A BIT 


The new BIAX NANOLOK electrically alterable memory gives 3-Plus megacycle 
TRUE NDRO operation at the low cost of 10 cents per bit in 200,000 bit systems 
in quantity. NANOLOK is designed for commercial and industrial data use — but 
can be adapted readily to shipboard and mobile MIL-Spec environments. 


BIAX used to be expensive, but not any more— not with «ШТ» 


NANOLOK. 
Do your bit and write today for all the details on 
RAYTHEON COMPUTER, 2700 South Fairview Street, Santa Ana, California 92704 


NANOLOK in Data File B-129. 


9git clone https://github.com/nneonneo/universal-doom 
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If you corrupt a format on purpose and the 
parser tries to rebuild the file, how does it do it first? 
What if you put a valid file structure in a comment? 
Such recovery algorithms are typically not officially 
specified, so each developer might do it differently. 

Some formats are extending older formats. Both 
the old and the new formats are present in the file. 
These formats are naturally ambiguous at a format 
level, and we might call them ambiguous polyglots. 

A widespread example is the Portable Exe- 
cutable, defined as an extension of the DOS format. 
Preciously few PE files—such as regedit95.exe— 
have a meaningful DOS payload. Valid PE files are 
expected to just have the same DOS stub with no 
unique code. 

Robert Xiao proved us wrong by crafting a uni- 
versal Doom binary, which works from DOS 6 to 
Windows 10, both as valid DOS and PE payloads in 
the same file.? This is something like a father and 
son having the same name, with no distinctive suffix 
whatsoever. 

Ciphertexts can be ambiguous too, even despite 
authenticated encryption! Check the TimeCryption 
article page 30 in this issue for abuses of GCM, 
GCM-SIV and OCB3. 


Collisions РОС|СТЕО 19:05 covered a lot of de- 
tails for exploiting hash collisions. However tar.gz 
and DOCX (ZIPped XML)—which were initially 
thought to be unexploitable—are properly dealt 
with and explained in the Inside Out article on 
page 19. 


Conclusion 


With basic knowledge of file format identification 
and abuse, Mitra can try different strategies and 
generates many forms of file abuse: payload embed- 
ding, mock files, polyglots and pseudo-polyglots. 

Pseudo-polyglots are the unified. form of file for- 
mats abuses to be combined with cryptographic op- 
erations. They include both AngeCryption, cover- 
ing ECB, CBC, CFB, OFB modes, and TimeCryp- 
tion, covering CTR, OFB, GCM, OCB3, GCM-SIV 
modes. 

Extensions of Mitra might cover ambiguous files 
with standard strategies, hash collisions and hash 
collisions over different formats. 


22:04 More Letters from Screwtape 


by the Demon Wormwood, and certainly not by Manul Laphroaig 


My good neighbors, 

Some of you surely remember those letters from 
а certain Uncle Screwtape to his nephew Wormwood 
that C.S. Lewis published eighty some years ago. 
Though some discount those letters as apologetics, 
а believer/s fictional account of demons discussing 
the best way to corrupt a well meaning (but poorly 
behaving) soul into perdition, I’ve often wondered 
with which modern sins Screwtape might be corrupt- 
ing his patients these days. 

Imagine my surprise when a mistake of the post 
sent the following to my door, which I reproduce 
faithfully and without redaction or comment. 


-PML 


My dearest nephew Malórt, 

I'm overjoyed to have heard that your patient 
still spends long, wasted days on social media, see- 
ing that others have adventures but never partici- 
pating in them first hand. The cat videos worry me 
a little, but so long as he can't give the cat noms or 
scritches I guess it can't feed his soul. 

Do be careful, though. The same Internet that 
feeds your patient and endless supply of computer- 
generated voices reading forum posts over a video 
game is entirely capable of giving him more danger- 
ous things. It can teach him to repair a car, and it 
has all the novels that we worked so hard to have 
censored back in my uncle Screwtape's day. 

Censorship back then the real deal. We had 
so much fun having our patients light a pyre and 
tossing books into it, that we eventually forgot the 
whole point was to keep the books from being read. 
'These days, the Opposition has perverted our fine 
tradition into something called Banned Book Week, 
where they give away free copies of the books we 
worked so hard to squash! One of them even por- 
trays your great grandfather Behemoth as a pudgy 
cat, thrown out of street cars after paying his fare 
and getting into a shoot-out with the NKVD. How 
insulting. 

Keep me apprised of your patient's progress, and 
be sure to watch for any signs of his finding anything 
useful out there. 


In service of our Lord Below, 
—Wormwood 
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Malórt, we have to talk. 

Iread in your last letter that your patient has de- 
clared himself to be a “computer programmer,” and 
that his television set has been off for a week while 
he repeatedly stumbled through the same chapter of 
a 21 Days book. 

You needn't worry that two more weeks will de- 
stroy your careful work in maintaining his illiteracy, 
but two more months might see you transferred to 
Search Engine Optimization or some other depart- 
ment with no souls to corrupt. Pay attention! 

A number of strategies might work here, but I 
heard of an excellent new one from your cousin, Le- 
gion. The idea is not to dissuade your charge from 
learning to program, but to slowly twist the idea of 
programming until he learns nothing useful. 

Begin with the choice of language. Ages ago, 
we'd start endless debates about whether BASIC 
was harmful and whether Pascal was for idiots. That 
kept good engineers from learning to read spaghetti 
code or use a language with safe strings, but now 
language arguments are going out of fashion. No one 
cares to debate them endlessly, just as the humans 
have largely forgiven one another for using different 
text editors. 

No, the trick these days is Artificial Intelligence. 
'They might call it Large Language Models, and they 
recently called it Machine Learning, but the beauty 
is that the user feels like he's programming a com- 
puter while never actually writing any code. 

So direct him back to the thirty-second videos, 
and whisper in his ear that “prompt engineering” 
is the bees knees. Get him to scroll endlessly, and 
know that if he does try some stuff on the prompt, 
he will never see the endless gigabytes of linear al- 
gebra beneath it all. It will feel like it makes sense, 
at least for thirty seconds, and then there will be 
another video. 


Your uncle, 
—Wormwood 


APPLE-PORT 


@ eröffnet Ihrem APPLE II verblüffende Anwendungsmóglichkeiten durch den 
Anschluß von wenigen, einfachen Bauteilen (z.B. Schalter, Relais, Thermistor, 


Photodiode, R/C-Glied usw.) an die Mini-Bananen-Buchsen 


€ vermeidet durch seinen Nullkraftstecker verbogene Pins an DIL-Steckern beim 
Wechseln von Paddles und Joysticks. 


€ mit ausführlicher Beschreibung von Anwendungen und mit Gratisprogram- 
men für den APPLE Il als: Thermometer, Serielles Druckinterface, Farbdetektor 
und D/A-Wandler. 


€ Preis: DM 123,— inkl. MWST (als Bausatz DM 93,— inki. MWST) 
€ Experimentier-Kit mit Sensoren DM 72,50 inkl. MWST 


А Dipl.-Ing. Hans W. ноге! - Computerzubehór 
А 


Telefon (06129) 71965- Telex 4182770 hwh d 
Nephew Malört, 

You are awfully worried that your patient has 
joined a hackerspace, and not without reason. After 
that gambit of yours to make him unemployed for a 
few months, he now has time on his hands to join a 
club, and there's danger that club might teach him 
something. 

'To prevent this, you just need to make the space 
a hassle for him instead of an inspiration. Take sol- 
dering for an example: if he keeps at it, he will quite 
soon become good at it. And if he gets good at it, 
ће? be able to assemble surface mount kits that the 
others cannot, which might give him the confidence 
to design his own. This can't be allowed to continue! 

So whisper a little in his ear. Make him turn the 
iron too hot, or let it crust up overnight when no 
one is watching, to get those barnacle covered tips 
that make an expert struggle. 

If he uses enough flux, tell him it's too much, and 
if he really uses too much that's not a problem, so 
tell him it's an embarrassment to his grandfather's 
grandfather that so much flux is wasted on an LED 
throwie. 

When he makes a reparable mistake, like using 
the wrong resistor value, be sure to fill him with 
shame. And if no magic smoke escapes, double up 
on the shame, as if he's the very first to have an 
LED that's a little bright or a little dim on account 
of its series resistor. 

When you come visit the eighth circle, we might 
discuss other ideas. My grandfather once got some 
monks to fight for a century over whether they were 
"of" the Enemy or “with” the Enemy, and I bet we 
could trigger a similar fight between 60/40 and lead- 
free solder. 


—Wormwood 
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Nephew Malórt, 


In your last letter, you made smug references 
to your patient “blinking LEDs with an Arduino,” 
and I worry that you don't understand how serious 
this might be. Arduinos might use 8-bit microcon- 
trollers, but they are programmed in an easy dialect 
of C! 


And worse, Arduino is a dialect of C for which 
a thousand convenient examples are shared without 
intimidation. If your patient first blinks LEDs, he 
might later blink them in sequence, or display the 
temperature on them. Pretty soon he might extend 
the examples with original code, and I worry that 
you might make side remarks about this too, ignor- 
ing the danger. 

For the danger in these blinky lights is that they 
are projects. In the same way that one might fail 
to learn Spanish from a book for years, but quickly 
learn the basics when there's no other way to buy 
food, a project has the power to make a boring lan- 
guage exciting. Your same patient—the one who 
a few short months ago wasted a month without 
getting to the second chapter of his 21 Days book— 
might soon finding himself giving a shit about C. 


Once he gives a shit, how long do you expect 
it might take him to learn a language with just 32 
keywords? 


—Wormwood 


Uncle Wormwood, 

I write to you just before sending my resigna- 
tion to Our Father Below. The Patient has indeed 
picked back up his 21 Days book, and this afternoon 
I overheard him explaining pointer arithmetic to a 
friend. 

I'm ashamed to say, the explanation was right. 


Your disgraced nephew, 
—Malórt 


SWTP 6800 OWNERS-WE HAVE A CASSETTE 1/0 FOR YOU! 


The CIS-30* allows you to record and playback data using an 
ordinary cassette recorder at 30, 60 or 120 Bytes/Sec.! No Hassle! 
Your terminal connects to the CIS-30* which plugs into either the 
Control (MP-C) or Serial (MP-S) Interface of your SWTP 6800 


Computer. The CIS-30* uses the self clocking ‘Kansas City'/Biphase 
Standard. The CIS-30* is the FASTEST, MOST RELIABLE CAS- 
SETTE 1/0 you can buy for your SWTP 6800 Computer. 


Kit — $69.95* 
Assembied — $89.95* 
(manual included) 

* plus 5% f/shipping 


PerCom has a Cassette 1/О for your computer! 
Call or Write for complete specifications 


ee 


Р.О. Box 40598 


* Garland, Texas 75042 . (214) 276-1968 


= ЕШ 
TEXAS RESI 


IOENTS ADO SX SALES TAX 


PerCom - ‘peripherals for personal computing’ 


22:05 Inside Out; or, 


Abusing archive file formats. 


We have previously demonstrated hash 
collisions in documents with blocks of 64 
bytes, such as the great MD5 pileup in 


PoC||GTFO 19:05. This used colliding, aligned 
blocks in pocorgtfoi9.pdf to match a hash 
of  pocorgtfoi9.exe,  pocorgtfoi9.png and 
pocorgtfoi9.mp4. That is to say, these files were 
not identical, but they did share an MD5 hash. 


'This research started with an incorrect assump- 
tion that Zip, TAR, and GZIP couldn't be gener- 
ically exploited with collisions. Even with the 
almighty chosen-prefix collisions, I thought that Zips 
may not work, XML will never work, and GZIP will 
always trigger a warning. 


Zip is the most collision unfriendly of standard 
file formats: bottom-up, pointers everywhere, dupli- 
cated data... Since they are officially parsed bottom- 
up, you can’t even use a Chosen Prefix Collision on 
a pair of Zip files if their size difference is bigger 
than 64 kb, as the EoCD (end of Central Directory 
record) of the smaller archive will be too far from 
the end of the file to be found, thus making the file 
invalid. 


On top of that, some critical data (such as file 
length, name, and content CRC32) is duplicated in 
the Local File Headers and in the Central Directory 
for a given file, which means it is present before and 
after the file contents—thus preventing any generic 
exploitation. 


And unlike most archive formats, Zip is a tree of 
pointers between structures instead of sequences, so 
any size change of file content will propagate on the 
rest of the file: the last structure of the file contains 
a pointer and the number of archived files. 

XML files also don’t play nice with collisions: 
CDATA comments are defined in XML files, but they 
have to use the defined encoding, which is incom- 
patible with the randomness of collision blocks. 

XML files don’t tolerate appended data either. 
It’s another totally collision-unfriendly format. 

DOCX files are Zip archives containing XML 
files and various data files, such as JPEG and PNG 
images. 


l0unzip pocorgtfo22.pdf APPNOTE-1.0.txt 
llgit clone https://github.com/resilar/crchack.git 
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by Ange Albertini 


Root file In DOCX files, the / rels/.rels file 
plays a very special role. It's the root of the docu- 
ment, which points to other XML files of the docu- 
ment. It defines the relationships between the files. 

You can move the files around provided you up- 
date the root, which requires a hard-coded path and 
filename in the archive. You can also make two doc- 
uments co-exist in the same archive, pointing to ei- 
ther in the root file. A valid strategy to generically 
collide two documents seems possible. 


Collision blocks You can't store the collision 
blocks after the XML content, since that would in- 
validate the root file's XML structure. And we can't 
easily forge the CRC of collision blocks, so we can't 
store them in the contents of a dummy file. 

However, we can store the collision blocks in the 
Extra Field of a file, since Extra Fields don't have 
a CRC. Extra Fields were defined in 1990, in the 
very first version of the specifications.!? They are 
commonly used and very extensible, so many im- 
plementations both ignore this field and preserve it. 
Extra Fields are stored before file contents, so they 
can't be stored in the Local File Header of the root; 
a dummy file stored after the root file can be used 
as a host for them. 

It's easy to force the same length for the root 
file. We just need to choose two close paths for each 
document. Storing them rather than compressing 
them guarantees the lengths to be identical and pre- 
dictable. 


CRC You need to keep the root file CRC con- 
stant despite the collision blocks, since the CRC is 
duplicated near the end of the file in the Central 
Directory. 

Forging a CRC is easy, but CRCHack makes it 
super easy!!! Just specify the bits you want, and 
it instantly gives you the requested output with the 
requested CRC32 without any encoding violation. 

As an example, we now demonstrate forging a 


CRC with ASCII characters. 


$ cat ascii 


<!--ABCDEF--> 

$ crchack \ 
-b 4.0:+.8*6:1 -b 4.1:+.8ж6:1 \ 
-b 4.2:+.8*6:1 -b 4.3:+.8ж6:1 \ 
-b 4.4:+.8*6:1 -b 4.5:+.8ж5:1 \ 
ascii Oxdeadf00d 

<!--tuI_\Y--> 


Only with the uppercase bit of letters: 


$ cat letters 

«!--THESEKINDSOFCRCAREVERY IMPRESSIVE- -> 

$ crchack -b 4.5:+.8*32:.8 letters Oxcafebabe 
<!--thEsEKIndsOFcRcAReVEryiMPREssIVe--> 


So now we have two versions of the root files, 
with the same CRC, the same length, and via a 
dummy file with Extra Field containing HashClash 
collision blocks: the two Local File Headers that give 
the archive the same MD5.!? 


Results Unlike most reusable generic collision 
prefixes with a header and no body, this actu- 
ally gives us two reusable generic collision pre- 
archives that are totally valid and manipulatable 
with standard tools. Provided you're careful with 
timestamps—either ignoring them in the source files 
or recompiling within two seconds—doing the same 
operations on both pre-archives will maintain the 
equality of hash values of both files, which is nice 
and very unusual. 

Even better, deleting any archived files beside 
the root and the dummy collision block file will re- 
vert to the original hash values without any further 
modification required! Who would have expected 
that standard Zip tools could give you predictable 
hash values? 


$ md5sum docx*zip 
6c33d52590ff0bbOcc8cdafe6aa5153b *docx1.zip 
6c33d52590ff0bbOcc8cdafe6aa5153b *docx2.zip 
$ zip -оХ11 docx1.zip zinsider.py 

adding: zinsider.py (deflated 64%) 
$ zip -оХ11 docx2.zip zinsider.py 

adding: zinsider.py (deflated 64%) 
$ md5sum docx*zip 
d12044feee801ad0530a911fa7fi18dbb *4осх1.21р 
d12044feee801ad0530a911fa7fí8db5 *docx2.zip 
$ zip -d docx1.zip zinsider.py 
deleting: zinsider.py 
$ zip -d docx2.zip zinsider.py 


125it 
13git 
leit 


clone https://github.com/cr-marcstevens/hashclash 
clone https://github.com/corkami/collisions; find 
clone https://github.com/corkami/collisions; find 
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deleting: zinsider.py 

$ md5sum docx*zip 
6c33d52590ff0bbOcc8cdafe6aa5153b *docxi.zip 
6c33d52590ffObbOcc8cdafe6aabi153b *docx2.zip 


Supported formats This trick is applicable to 
any file format made of a Zip-ed XML with a root 
file. It works for .docx, .pptx, and .xlsx from 
Office, for the open container format in ePub, and 
for other open packaging conventions, such as .3mf 
for 3D manufacturing and the XML Paper Specifi- 
cation, .xps and .oxps. 

Corkami collisions! zInsider makes it possible 
to instantly collide any of these formats, with pre- 
computed prefix archives. !? 

'This is easy to extend to any other similar for- 
mat, but a new prefix pair must be recomputed for 
any new format. 

Some formats like Quake’s PK3 aren't ex- 
ploitable: they don't have a root file to abuse. The 
Open Document Format requires their root file to 
mention every other file, which isn't generic. APK, 
JAR, and XPI are even worse: they require all the 
other files’ hashes! 


Gzip 


TAR files have no room for any abuse: pure se- 
quences of headers with hardcoded size and offsets, 
then file contents. No declared lengths, no skip- 
pable content. You can use chosen-prefix collisions 
on them, but that’s it: nothing generic. 

Gzip doesn’t seem to be playing nice with hash 
collisions either: any extra data is placed before the 
compressed file contents, and appended data typi- 
cally triggers a warning and is not taken into account 
for parsing anyway. Gzip collisions are possible, but 
not in a generic way. 

However, while most Gzip files start with the 
typical 1F D8 structure—called a member—it’s ac- 
tually specified that a Gzip file can contain several of 
these members, in which case the data of each will 
be decompressed and concatenated. So a member 
with no compressed data but with extra data acts 
as a comment that can be parasitized, albeit quite 
a complex one. 

Since the length of the Extra Field is stored on 
two bytes in little-endian before the Extra Field it- 
self, it’s even exploitable with UniColl!'4 


collisions -name zinsider.py 
collisions -name unicoll.md 


So a generic reusable hash collision for Gzip is ac- 
tually possible via a classic sequence of comments. 
First one comment to align the rest of the file to col- 
lision block boundaries, then one comment whose 
length is variable—its encoded value will be over- 
lapping with one of the differences in the collision 
blocks—and then we start two chains of comments 
to toggle one payload or the other, exactly like we 
did for JPEG, МОБ, or SHAI. 


Colliding GZIPs like JPEGs Like JPEG, we 
have this limit that extra field can't be bigger than 
64 kb, but recompressing data in chunks of 64 kb is 
much easier with Deflate than with JPEG! Since the 
decompressed data of all members is concatenated, 
we just need to cut the archived data in chunks. 

'This idea isn't new. Some formats like BGZIP 
(2008) chunk the data in several members and store 
an index in the extra field, making it easier to de- 
compress some contents separately while maintain- 
ing a standard Gunzip-compatible structure. This 
is a common source of multi-member Gzip files. 

So it gives us reusable hash collisions for any- 
thing that relies on Gzip as outer encryption, such 
as .tar.gz or SVGZ. As long as the data is decom- 
pressed, the structure of the outer archive can be 
freely modified. 

However, some programs like Inkscape use their 
own lightweight implementation of Gzip, which 
doesn't support files made of several members, so 
our collision strategy will not work in these excep- 
tional cases. 


Conclusion 


While Zip, XML, GZIP, and TAR seemed very hos- 
tile to collisions, combining several tricks made it 
possible to get generic reusable hash collisions for 
GZip archives (. tar.gz) and Zipped XML files with 
a root, such as DOCX files. 

The strategies are very different, even if they 
both rely on the extensible Extra Field which is sim- 
ilar in both formats. For DOCX, it's a merge of two 
documents inside the same Zip, with two versions of 
the same root file. For Gzip archives, Extra Fields 
are used as comments, and two independent archives 
are interleaved via two chains of skip and data. 

Other formats aren't playing that nice: Bzip2 
is a pure compressor, bit-based with only bit align- 


15214 clone https://github.com/brimstone/fastcoll 


ment, and no padding and no form of comments. 
Other formats such as XZ, AR or Compress (.Z 
archives) are just too simple for any exploitation. 
RAR applies CRC16 to headers, which does not help 
our cause. 

Thanks for Yann Droneaud for the TAR.GZ 
challenge, and Philippe Lagadec for the DOCX chal- 
lenge! 


Still using MD5? It might feel useless to still 
care about MD5, but as MD5, SHAI, and SHA2 
use the same construct, exploits of hash collisions 
via file format tricks will be re-usable for other 
hash collisions while being cheaper to pull off with 
MD5. These techniques would work for SHA1 via 
the Shamble attack too, except that it costs $45,000 
USD to compute it. And at least, MD5 is still 
widespread enough that it has enough targets to at- 
tack in practice, unlike MD2 and MD4! 

You might be tempted to still use MD5 to des- 
ignate a file, but using MD5 will expose you to all 
kinds of tricks and confusion that SHA2 or Blake2 
don't. 

Fastcolls are very quick to compute and can be 
chained, providing one bit of stored data while keep- 
ing the MD5 constant.!° They will make it trivial to 
watermark a file, and a very short shellcode can eas- 
ily detect which version of the file is running, then 
adjust its behavior accordingly. Using a stronger al- 
gorithm would prevent any possible pranks or con- 
fusion, at least for some years until we get better 
collisions. 


Bonus: ZGIP Zip can use Deflate among other 
compression algorithms. On the other hand, Gzip 
only uses Deflate. 

Both are wrapping Deflate data around differ- 
ent structures that are not compatible. By abusing 
structures, it’s possible to make ZGIP, a chimera 
of Zip/GZIP: a polyglot file sharing the compressed 
data.!6 

By abusing Deflate stored blocks and dummy 
members, it's even possible to partially hide some 
data from the other format, even if they belong to 
the same stream. 

In short, this is just going the extra mile to prove 
that GZIP is not a wrapper around Zip, nor Zip is 
a wrapper around GZIP.!? 


lÓógit clone https://github.com/corkami/pocs; find pocs -name zgip 
lThttps://speakerdeck.com/ange/gzip-equals-zip-equals-zlib-equals-deflate 
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Bonus: The craziest colliding file The latest 
advanced MD5 manipulation is a very clever ZStan- 
dard+Tar hashquine+polyglot by David ‘Retr0id’ 
Buchanan,'® also known for his beautiful PNG 
hashquine.!? 

It can either be just a Zst file, but also a Tar.zst, 
so the Tar header can be toggled on or off, as well 
as the complete tar checksum. То be a reusable 
hashquine, it's able to output any MD5 and Tar 
checksum while keeping the whole file's MD5 con- 
stant. 

The same prefix is reusable in three differ- 
ent ways. First it can be a pure ZStandard file 
hashquine. 


$ md5sum hashquine.zst 
720ca7f6842f1a608fcb924f5811ebb9 *hashquine.zst 


$ zstd -cd hashquine.zst 
The MD5 of hashquine.zst is: 
720ca716842f1a608fcb924f5811ebb9 


Second, it can be a Zstandard(tar) file. 


$ md5sum hashquine.tar.zst 
703911cf9e409965cebd05392acc1503 *hashquine.tar.zst 


$ tar -Oxf hashquine.tar.zst hash.md5 
The MD5 of hashquine.tar.zst is: 
703911cf9e409965cebd05392acc1503 


Finally, it can be a self-checked “auto-manifest” 
Tar.zst. 


$ md5sum self.tar.zst 
£068d54fabb12dbb1b359745a80d78fc *self.tar.zst 


$ tar -xvf self.tar.zst 
x hash.md5 
x hello.txt 


$ cat hash.md5 
£068d54fabb12dbb1b359745a80d78fc *self.tar.zst 


ed076287532e86365e841e92bfc50d8c *hello.txt 


$ md5sum -c hash.md5 
self.tar.zst: OK 
hello.txt: OK 


The whole prefix uses 653 Unicolls to toggle Zs- 
tandard frames and output optional contents after 
decompression. 

For the optional Tar Header (generic for any 

hash.md5 contents), it uses one frame for the con- 
stant Tar header start, 8x11 frames for the hash .md5 
file size in octal, one frame for the constant Tar 
timestamp 14412572240, 8*6 frames for any tar 
header checksum in octal, and one frame for the rest 
of the tar header. 
For the optional text prefixes in the file contents, 
t uses one frame for the constant prefix of “The MD5 
of hashquine.tar.zst is’ and other for “The MD5 of 
hashquine.zst is” in ASCII. Finally, it uses 3216 col- 
lisions for all nybble possibilities of an MD5 hash. 


m 


Bonus: Wordpad weird files Colliding .docx 
files will show the same document with Microsoft 
Wordpad. It turns out that Wordpad ignores the 
root files entirely, and just locates the document file 
via the Content Types files. Really! 

As you would expect with such sloppiness, it 
doesn't check if all files in the archive are declared 
in the Content Types file, which can turn any Zip 
archive into a very weird .docx that is Wordpad- 
only with just two XML files. Sadly, this issue being 
far from a standard file. Wordpad is confused as it 
should be, and we can't make this issue a Wordpad- 
compatible DocX file too. Extract an example from 
this PDF's attachments.?? 


18git clone https://github.com/corkami/collisions; find collisions -name hashquines 
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20unzip pocorgtfo22.pdf mini.docx 
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unzip pocorgtfo22.pdf retrOid.zip; unzip retrOid.zip hashquine. by retrOid.png 


22:06 Mitigations are a reverser's friend; or, Abusing ХЕС 


Control flow integrity protections, with its vari- 
ous implementations, have been the latest itera- 
tion of compiler mitigations for memory corruption 
exploits. They hope to make code reuse attacks 
more difficult or impossible. Implementation details 
vary, but all boil down to restricting possible valid 
targets of indirect calls. LLVM's is called “Con- 
trol Flow Integrity, Grsecurity has “Reuse Attack 
Protector” and Microsoft's is called “Control Flow 
Guard" (CFG). 

'The core idea behind Microsoft's CFG is ensur- 
ing that function pointers can only point to valid 
function entry points before being used to perform 
a function call. The compiler inserts checks that, 
during runtime, inspect every indirect call instruc- 
tion and terminate the process if the target isn't a 
valid and known function start. 

Putting aside the completeness or effectiveness 
of this mitigation, let's ask whether we can glean 
some extra information about the code itself by the 
presence of these checks. As Deroko points out in 
Control Flow Guard Instrumentation,?! CFG mech- 
anisms can serve as a way to hook all indirect calls 
in a binary without specifically looking for them in 
advance. They can also precisely identify function 
entry points, which is not always a trivial task. 


ELIMINATES 
THE 
MIDDLEMAN! 


_ | AN INCREMENTAL RECORDER 
FOR 


53750: 


COMPLETE, READY TO OPERATE 
Build it in (Rack Mount) 
or carry it (Portable) 


DIGITAL STEPPING Recone 


DATA AT ANY RATE —0 TO 400 — 
STORE CHARACTERS PER SECOND 


READ ом ANY ІВМ COMPATIBLE TRANSPORT | 


GENERATES FULLY COMPATIBLE IBM TAPE AUTOMATICALLY! 


и Choice of 200 or optional 556 bits per inch ш Standard ІВМ type 1014” reels, 15" tape 
и 7 track standard spacing ш 24” inter record gap with longitudinal check character 
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?lunzip pocorgtfo22.pdf cfghook.zip 
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by Aleksandar Nikolic 


With the release of Windows 11, Microsoft is in- 
troducing another iteration of control flow integrity 
mitigation called “eXtended Flow Guard” or XFG. 
In short, it further restricts targets of indirect calls 
to not only valid function entry points, but to a 
subset of functions that have a particular signature 
consisting of return value type, number and types 
of parameters and other function properties. 

Surely, this added metadata can somehow aid us 
in our reverse engineering process. To see how, we’ll 
need to understand the implementation details. 


What is XFG and how it works 


Extended Flow Guard is introduced as a com- 
piler extension that can be enabled via /guard:xfg 
switch that is available in MS's C and C++ compil- 
ers since at least the 19.27.29112 version of Visual 
Studio 2019. It hasn't seen full support or much 
public use until release of Windows 11. Consider an 
example: 


int test( ){ 
return 0; 
F 
int (*cfgTest[1]) O = {test}; 
int main(){ 
cfgTest [01 O ; 
} 


This code has a simple function pointer array 
cfgTest and makes a call to test using that func- 
tion pointer. If compiled with cl /Zi /guard:xfg 
simple.c its assembly looks a little odd. 


sub rsp, 38h 

mov eax, 8 

imul rax, 0 

lea rcx, cfgTest 

mov rax, [гсх+гах] 

mov [rsp*38h*var. 18], rax 
mov r10, 0D30527475E523070h 
mov rax, [rsp-*38h*var,. 18] 
call cs: guard xfg dispatch іса11 fptr 
xor eax, eax 

add rsp, 38h 

retn 


This is some peculiar code. There is no indi- 
rect call to function test, rather there's a call to 
..guard xfg dispatch icall fptr with certain 
arguments. The function pointer is actually saved 
in rax and an odd-looking constant is moved into 
r10 before .. guard xfg dispatch іса11 fptr is 
called. This odd-looking constant is what we will 
call an XFG hash. Interestingly, if we take a look 
at test function's prologue on page 25, we'll see 
(almost) the same data. 

Long story short, before invoking the target 
function, |. guard xfg dispatch іса11 fptr will 
check that the hash in r10 matches the hash located 
right before the function. If they don't match, pro- 
cess is terminated.?? 

This ensures that only legal target functions can 
be executed at this particular indirect function call. 
The next obvious question is: how is this function 
hash derived? That brings us to the core idea be- 
hind XFG. 

If we think about it, no matter how an indi- 
rect call instruction happens to be generated by the 
compiler, several things are true for all the possi- 
ble, valid, target functions in a valid program. All 
possible target functions must have the same num- 
ber of arguments, the same calling convention, same 
argument types, same return value type and so on. 
Even if the compiler doesn't know of all possible tar- 
get functions in advance, it must know all of these 
facts about those targets. It can, then, generate a 
unique representation of those facts when it encoun- 
ters an indirect function call. On the other hand, for 
every function that could be a possible target for in- 
direct call, the same unique representation can be 
calculated and those two can be compared during 
runtime. 

This unique representation of function prototype 
information is what constitutes an XFG hash. 


How is an XFG hash generated? 


Francisco Falcon over at Quarkslab has already done 
the hard work of reverse engineering most of XFG 
internals. Their extended writeup provides a num- 
ber of examples.?? ХЕС hash generation happens 
in the с1.ехе compiler's frontend с1.411 and re- 
volves around gathering function prototype informa- 
tion and using the SHA256 hashing algorithm on it 
while following certain rules. A list of function prop- 


erties that figure into the XFG hash is (as far as C 
code is concerned at least) as follows: 


e number of arguments 

e the types of individual arguments 

e type of return value 

e whether the function is variadic or not 


e the calling convention 


When preparing to calculate the hash, each of 
these is represented in a specific way. Some are sim- 
ple constants, while others have more structure and 
are often recursively defined. For example, the num- 
ber of arguments is just represented as a 32-bit in- 
teger, the calling convention appears to be a 16-bit 
constant, and variadic is one byte boolean. Return 
value and argument types, on the other hand, are 
more complicated. 

'Those consist of values specifying type qualifiers 
(const, volatile), type groups (primitives, point- 
ers, structs/unions/enums), and values according to 
the type group. Calculating values for primitive 
types are the simplest and are just a table lookup: 


"void" :Oxe, 
"char" :0х1, 
"signed char" :0х1, 
"unsigned char" :0х1, 
t Cintas" :0х1, 
"char8 t" :0х1, 
"__іп+16" :0х6, 
"Short int" :0x6, 
"unsigned short int" :0x86, 
"float" :0х11, 
"int" :0х7, 
"__11132" :0х7, 
"unsigned int" :0х87, 
"long int" :0x10, 
"unsigned long int" :0x8a, 
"double" :0x12, 
" Lint64" :0x8, 
"long double" :0x12, 
"long long int" :0x8, 
"unsigned long long int" :0x88, 
"unsigned long long" :0x88, 


Notice that there are several distinct primitive 
types that have the same value. Structs, unions, and 
enums are treated the same, and their actual (ver- 
batim text) names are included as part of a hash 
calculation. 


?2A great in-depth description from Connor McGar is available as Exploit Development: Between a Rock and а (Xtended 


Flow) Guard Place: Examining XFG. 


?3See How the MSVC Compiler Generates XFG Function Prototype Hashes by Francisco Falcon. 


SUBROUTINE 


eax, 


.text:0000000140001008 
.text:0000000140001010 
.text:0000000140001010 ; ============= 
.text:0000000140001010 
.text:0000000140001010 
.text:0000000140001010 ; int test(...) 
.text:0000000140001010 test proc near 
.text :0000000140001010 

.text :0000000140001010 xor 
.text :0000000140001012 retn 
.text:0000000140001012 test endp 


dq 0D30527475E523071h 


; DATA XREF: .rdata:. guard fids table 
; .data:cfgTest 
eax 


Figure 4: test Function's Prologue 


Pointers of any kind are the most complicated, 
as their value is the hash of the type they point to, 
requiring recursive evaluation. 

This can look a bit confusing and — although 
it's covered in great detail in the referenced Quark- 
slab article — we'll illustrate the process with the 
simplest example. We'll add a void pointer as an 
argument to test from before: 


int test(void *arg); 


First, there's only a single argument to this func- 
tion, so we will append “\x01\x00\x00\x00” to our 
data to be hashed (data0). Second, we need to con- 
sider function arguments, calculate their hashes, and 
append them to data to be hashed. There is only 
one argument and it's a pointer without qualifiers. 
Starting a new hash (data1), we append “\x00” for 
qualifiers, “\х03” for type group but then we need to 
consider the type of pointer and calculate that hash 
separately. Starting yet another hash calculation 
(data2), we append “\х00” for qualifiers, “\x01” for 
type group and finally “\x0e” for primitive type. 
Calculate the SHA256 of data2 and append its first 
8 bytes to data1 that completes necessary data for 
calculating first argument hash. Hash data1 and 
append the first 8 bytes to data0. That concludes 
the argument part of the hash. Next is whether 
the function is variadic, so we append “\x00” and 
what the calling convention is, which defaults to just 
"x01". The last segment is the return value type 
which is an integer primitive, so it's simply “\x00” 
for qualifiers, “\x01” for type group and finally 0x7 
for a primitive type. The hash of that is appended 
to data0. 

Putting that together gives us the following, with 
all SHA256 results truncated to the first eight bytes. 
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Sha256 ("\x01\x00\x00\x00" 
+sha256 ("\x00\x03"+sha256 ("\x00\x01\x0e") ) 
+"\x00"+"\x01"+sha256 ("\x00\x01\x07")) 


After some final transformations, the result of 
the operation is the “719a5e103606e1b2” value that 
appears before the test function in the binary. 

An implementation of this algorithm, in Python, 
that parses a given C function prototype and gen- 
erates its corresponding hash can be found as an 
attachment.?4 
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unzip -p pocorgtfo22.pdf xfg-scripts-args.tgz | tar -xzvf- gen_hash_from_ast.py 


Using XFG to resolve indirect jumps 


Now that we know how XFG works, we can consider 
how it can be of use as a reverse engineering aid. 

The first, and most obvious idea is that it 
can reduce the uncertainty of analyzing indi- 
rect calls. Since all indirect calls in an XFG- 
protected binary will inevitably be dispatched 
through ... guard. xfg dispatch, icall fptr that 
must match callsite's hash and target function's 
hash, it should be possible to enumerate all possible 
targets completely statically (assuming all possible 
linked code is known/available for analysis). 

Let's illustrate this with an example. Through- 
out the rest of the article, we'll use ntd11.d11 bi- 
nary from Windows 11 for illustrations and testing. 
If we go to function 'LdrQueryProcessModuleInfor- 
mationEx' and take a look at the following piece of 
assembly: 


18000174е 488d04bf 
lea гах, [rdi-*rdi*4] 
180001752 49ba7048da56963e... 
mov r10, 0x85f13e9656da4870 
18000175c  498b44c118 
mov rax, qword [г9+гах*8+0х18] 
180001761 ff15a9181900 


call qword 
[rel ..guard. xfg dispatch іса11 fptr] 
{j_sub_1800aa130} 
180001767 4c8d0df2b71200 
lea r9, [rel data 18012cf60] 


While we don't know without debugging 
what possible target this XFG dispatch call 
might have, we can see that its hash must 
be Ox85f13e9656da4871 (the 1 is added at the 
end of the supplied hash by dispatcher). И 
we search the binary for functions that have 
this ХЕС hash, we'll find many results: Ldr- 
QueryModuleInfoLocalLoaderUnlock,  LdrShut- 
downThread, LdrShutdownProcess, RtlDetect- 
HeapLeaks, TpTrimPools, RtlCleanUpTEBLang- 
Lists, RtlFreeThreadActivationContextStack, 
LdrProcessInitializationComplete, RtlFlush- 
Heaps, RtlReleasePebLock, RtlAcquirePebLock, 
LdrFastFailInLoaderCallout,... 

Obviously, from the function names, not all of 
these make sense as possible targets for this indirect 
call because of their differing semantics, but there's 
a good chance that all with Ldr prefix are actual 
possible targets. 

Why are there so many hash hits that are un- 
likely to be real targets? It's probable that the tar- 
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get function prototype in this case is very simple, 
and matches many other functions. In fact, hash 
Ox85f13e9656da4871 represents the simplest pos- 
sible case of ‘void fname()’. As another example, 
the TppCallbackEpilog function has the following 
indirect call: 


18001766e  488b8eb8000000 
mov rex, qword [rsi-*0xb8] 
180017675  4c89aeb8000000 
mov qword [rsi*0xb8], r13 
18001767c 48804108 
mov гах, qword [гсх+0х8] 
180017680 49ba70125178f527... 
mov r10, Oxa6d127£578511270 
18001768а 48804008 
mov rax, qword [гах+0х8] 
18001768e ffi157cb91700 
call qword 
[rel |. guard. xfg dispatch icall fptr] 
{j_sub_1800aa130} 


{0x0} 


Looking up the target hash, 0xa6d127£57851- 
1271, in the binary yields:  TppSimplepFree, 
TppWorkpFree, TppAlpcpCallbackEpilog, Tpp- 
JobpCallbackEpilog, TppFreeWait, TppTimer- 
pFree, TpplopFree, TppAlpcpFree, TppJobp- 
Free, TppWorkCancelPendingCallbacks, Tpplop- 
CancelPendingCallbacks. 

All of these look like possible real targets given 
their context. 

So while not completely precise, this simple 
static analysis that relies on XFG hashes definitely 
sheds some light on indirect calls that might other- 
wise remain completely unresolved. 

Attached is a Binary Ninja plugin that annotates 
indirect calls with information gained by XFG anal- 


unzip -p pocorgtfo22.pdf xfg-scripts-args.tgz | tar -xzvf- xfg analyzer.py 


Brute forcing XFG hashes for function 
prototype recovery 


Another, more involved, idea stems from the fact 

that XFG hashes aren't random and actually en- 

MAX. LENGTH code function prototypes. Surely, there would be a 

к=! +% је | aS 1% +% — way to recover at least some of that information and 

e— +| BEEN make use of it. 

.020 + .001 + While it is not possible to reverse the hash back 

rios redd Mons to function prototype directly, it is perfectly feasi- 

ble to precompute a lookup table for all possible 

METAL CASE En o rp ДШ function prototypes (up to certain number of argu- 

SERIES ША (ЖӨН | series ША ҚОН ments). If we ignore structs, unions апа enums for 

.095 Р a second, there are only a fairly small number of 

primitive types. In fact, if we remove the duplicates, 

there's a total of only 12 primitive types (with dis- 

SMAL L SIZE tinct values as far as XFG generation is concerned). 

Adding in type qualifiers (const, volatile) and 

pointers, a bit of simple combinatorics tells us that 

LONG SHELF LIFE total number of all possible function prototypes is 
roughly (12 ж 3)"m. аға 4+1. 

'This gets big very fast as we increase the number 

of arguments, but the whole list is precomputed in 


SOLID TANTALUM CAPACITORS 
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WITH MAXIMUM CAPACITANCE 


ALL DRY CONSTRUCTION 


LOW CO ST minutes for functions up to three arguments. 
WELL BELOW 20$ EACH import sys 
IN PRODUCTION QUANTITIES васеи 
from jinja2 import Template 
4 types = ["void", "char", "short int", 
AVAILABLE RATINGS "unsigned short int", "float", 
STANDARD CAPACITANCE TOLERANCE + 20% 6 "int", "unsigned int", "long int", 
"unsigned long int", "double", 
PART САР WV МАХ MAX 8 “long long int", “unsigned long long"] 
нына -————E # add all types as pointers 
СМ155 1.5 25 .08 1.0 | А 
| 10| types += [x + " ж" for x in types] 
CM225 22 .08 10 | 
# and as consts 
12| types += ["const " + x for x in types] 


f and as volatiles 
14| types += ["volatile " + x for x in types] 


16| j2_template = Template(""" 
{{ret_type}} fname( {%- for param type in 
param types -/} {{param_type}} argítloop 


.indez)J(( "," if not loop.last JJ {4- 
endfor -4J); 
18 on ") 
20| пах. Ғапс рагашв = 3 
Visit us at f = open(sys .argv [1] 5 ни") 
ВООТН 2А-48 - - 9214 = 0 
IEEE Show for ret_type in types: 
24 for pn in range(4, max_func_paramst1): 
TECHNICAL BULLETIN AVAILABLE ON REQUEST B : 
for c in itertools.product(types, 
26 repeat-pn): 
f.write(j2 template.render(í"ret type" 
г COMPONENTS, INC. ret type, "param types": c})) 
SMITH STREET. BIDDEFORD, MAINE 28| із-і 
— PHONE А.С. — 207-284-5956 f.close() 
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'This code uses a jinja2 template to generate an 

exhaustive list of all possible function prototypes 
starting with given primitive types. These generated 
prototypes can then be fed into the hash generation 
algorithm to compile a lookup table. 
So, does this work? We'll test this on ntd11.d11 
again. This particular version of the DLL has a total 
of 1564 functions that have an XFG hash associated 
with them. Out of those 1564, there are a total of 
995 unique XFG hashes. After lookups, this sim- 
ple matching has identified function prototypes for 
131 unique hashes, corresponding to a total of 294 
functions! 

By simply precomputing all possible function 
prototypes up to three parameters (using nothing 
target specific, only primitive types) we were able 
to recover precise function prototypes for about 13% 
of unique hashes in п%411.411. Figure 5 has some 
examples. 

'The proof-of-concept works, but there are a cou- 
ple of reasons why we didn't get a higher hit rate. 
First and most obvious is that many functions sim- 
ply have more than three arguments, but even bigger 
factor is the fact that ntd11.d11 code heavily relies 
on use of structures, enums, and structure pointers. 
Since hashes for struct, union, and enum types in- 
clude their names directly, straight up brute forcing 
isn't practical, but seeding certain (domain specific) 
names would greatly increase the hit rate. XFG 
hash calculation implementation supports structs in 
function prototypes, and since structs, enums, and 
unions are treated the same, all we need to do to 
add struct names is to expand the list of primitive 
types. Adding struct in_addr to list of primitive 
types leads to following result: 


71394252а15Ь764е8 char *func( 


const struct in_addr *argi, char *s) 


'This calculated hash matches the XFG hash for 
Rtllpv4AddressToStringA. By adding target spe- 
cific, commonly used, structs to prototype gener- 
ation we can greatly increase the number of found 
hashes at the expense of a larger lookup table. Since 
structures and other type information are sometimes 
publicly available even if function prototypes are 
not, this allows for very precise function prototype 
recovery. 

How do we know that these results are actu- 
ally correct? Let's take another look at an example 
where we do know the function prototype. Function 
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‘RtlSetUserValueHeap’ has four arguments. Binary 
Ninja guesses its prototype to be: 


void* const* RtlSetUserValueHeap( 
int64 t argi, int32 t arg2, 
int64_t arg3, int64 t arg4); 


Similarly, IDA guesses: 


char __fastcall RtlSetUserValueHeap ( 
-.int64 ai, unsigned int a2, 


__int64 a3, __int64 a4) 


This function’s ХЕС hash is 0xc76c3600585a- 
#171 and a lookup reveals the following function 
prototype: 


char RtLSetUserValueHeap ( 
void *argi,unsigned long int arg2, 
void *arg3, void *arg4); 


Notice how both Binary Ninja and IDA can- 
not know that some of the arguments are point- 
ers. This simple fact adds a lot of information that 
greatly aids further function analysis and decompi- 
lation. And what about correctness? While source 
for ‘RtLSetUserValueHeap’ isn't available, it is reim- 
plemented in ReactOS where its function prototype 
is: 


BOOLEAN 
NTAPI 
RtlSetUserValueHeap( 
_In_ PVOID HeapHandle, 
_In_ ULONG Flags, 
_In_ PVOID BaseAddress, 
_In_ PVOID UserValue 
); 


While the prototype gathered from ХЕС analysis 
lacks some extra annotations, the types themselves 
match precisely! 


In Conclusion 


Even though mitigations like XFG pose a real chal- 
lenge when it comes to exploitation, it sometimes 
pays off to take a step back and consider the possible 
side effects that can be handy in other ways. We've 
shown that a very simple lookup table can recover 
a treasure trove of information that can be helpful 
when reverse engineering an XFG-protected binary. 
As XFG adoption spreads to code other than Mi- 
crosoft’s, this can definitely lead to some interesting 
discoveries. 


char RtlGetSecurityDescriptorRMControl(void *argi, char *arg2); 
unsigned long int RtlNumberOfSetBitsUlongPtr(unsigned long long int argi); 
char RtlEqualWnfChangeStamps (unsigned long int argi, unsigned long int arg2); 
unsigned long int RtlSetProxiedProcessId(unsigned long int argi); 
void RtlWnfDllUnloadCallback(void *arg1); 
void *memchr(const void *argi, int arg2, unsigned long long arg3) ; 
char *strchr(const char *argi, int arg2); 
unsigned long long strcspn(const char *argi, const char *arg2); 
unsigned long long strnlen(const char *arg1, unsigned long long arg2); 
char *strpbrk(const char жагрі, const char *arg2); 
char *strrchr(const char *argi, int arg2); 
unsigned long long strspn(const char *argi, const char *arg2); 
char *strstr(const char *argi, const char *arg2); 
int tolower(int arg1); 
int WinSqmCommonDatapointSetDWORD64 ( 
unsigned long int argi, unsigned long long arg2, unsigned long int arg3); 
int WinSqmCommonDatapointSetString ( 
unsigned long int argi, const unsigned short int *arg2, unsigned long int arg3); 
int WinSqmGetInstrumentationProperty ( 
const unsigned short int *argi, const unsigned short int *arg2, 
unsigned short int *arg3, unsigned long int *arg4); 
int WinSqmIsOptedInEx(unsigned long int arg1); 
void AlpcGetCompletionListLastMessageInformation( 
void *argi, unsigned long int *arg2, unsigned long int *arg3); 
unsigned long int DbgPrompt(const char *argi, char *arg2, unsigned long int arg3); 
char RtlQueryProcessPlaceholderCompatibilityMode(); 
char RtlSetProcessPlaceholderCompatibilityMode(char argi); 
char RtlIsNonEmptyDirectoryReparsePointAllowed(unsigned long int агр1); 
char RtlIsZeroMemory(void *argi, unsigned long long arg2); 
unsigned short int RtlLogStackBackTrace(); 
void *RtlLogStackTrace(unsigned long int argi); 
void RtlReleaseStackTrace(void жагр1); 


Figure 5: Example Prototypes from ntd11.d11 
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22:07 'Timecryption, OTP with Near-polyglots 


Our foundation for this is the CounTeR (CTR) 
block cipher mode, which effectively turns a block 
cipher into a stream cipher. From a Nonce and a 
Key, it generates a keystream. The plaintext is then 
xored with this keystream to obtain the ciphertext. 
This mode acts as a one-time pad. Just an xor 
against a keystream, so encryption and decryption 
are the same operation. The cipher’s decryption op- 
eration itself isn’t used. If we decrypt with a differ- 
ent key, we end up xoring with a different keystream. 
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by Ange Albertini and Stefan КОШ 


What about crafting an ambiguous ciphertext? 
We define this as a ciphertext that gives meaningful 
plaintexts for different keystreams! 

'To do this, recall that we can freely modify the 
ciphertext: the keystream is set by (Nonce, Key), 
and plaintext and ciphertext aren't involved, which 
means that for a given keystream if we change ci- 
phertext bytes, we set the plaintext bytes, as it's 
simply an xor against a keystream. 

So, we can directly create such a ciphertext with 
a binary polyglot whose interpretation varies by the 
eky. We just independently encrypt the different 
ranges of the file with the different keys, then com- 
bine the two ciphertexts at the right offsets. 


Making Decryption Relative to Time 


But we run into a key question: how do we have an 
uncooperative system decrypt to two different re- 
sults? We postulate that in real-world applications, 
specifically those having key rotation, we can do this 
leveraging time. 

If we know the key rotation scheme used by a sys- 
tem, we can craft a file that, when encrypted with 
the current key, might be authentically decrypted 
later with a different key added to the key ring. 
(Typically, newest keys are tried first, and decrypted 
plaintext is returned as soon as the decryption is 
authenticated.) So the file will be transparently de- 
crypted to something else, something that you de- 
cided in advance: 

Timecryption combines what you want now with 
what you want later. You control both. When im- 
plemented against a known key rotation scheme, it's 
transparent and works as intended. 


Near-Polyglots 


Typically, each ciphertext byte belongs to one pay- 
load and one only. But if we leverage two keys from 
the key rotation scheme—K, for now and К» for 
later—we can bruteforce a nonce that will get some 
bytes decrypted to two different sets of values. 

This means that we can make two formats that 
will coexist in the same file starting at offset zero, 
such as PDF/PE or JPG/PNG, or the same format 
twice, where JPG/JPG would be a near ambiguous 
file. 


'There are two ways we identify to handle these 
pairs of files with the same format. One way to 
do this is with a technique such as causing a differ- 
ent comment length, a bit like a hash collision for 
JPG/JPG. In this case, it's a file with one header 
and two contents. Another way is to do so for 
formats that work from any offset such as HTM- 
L/HTML. In this case, it's two contents coexisting 
in the same file. 

Note that the smaller number of bytes in the 
overlap of the two formats, the faster the nonce 
bruteforce will be! The overlap only needs to be 
as long as is required given the specific formats. For 
example, /CC requires any parasite to start at off- 
set 0x132, which is impractical to bruteforce. This 
technique can be exploited quickly with formats like 
JPG since it has a very small minimal offset of 4. 

The Mitra repository has all the tooling for CTR, 
OFB and GCM modes with precomputed exam- 
р1ев.26 


Attention 
Software Houses 
For Low Prices 
on Diskettes 


CALL COMARK! 


Try Nashua diskettes for premium 
performance — 100% certified for 
error-free dependability. 


Nashua Products a 


Nashua mini-disks 
with hub rings 


Quantity 
- 90 100: 


$2.39 $1.89 


Nashua Blank ‘п’ Bulk mini 
disks with hub rings (no 
label, box or envelope). Add 
5€ per diskette for tyvek 
envelope. 


(800) 323-6135 


In Illinois, call collect (312) 834-5000 


Comark, INC. 


481 W. Fullerton Avenue, Elmhurst, Illinois 60126 


$1.79 


Toll-Free 
Order 
Hotline 


?6git clone https://github.com/corkami/mitra 
?7git clone https://github.com/kste/keycommitment 


With Authenticated Encryption 


In the case of CTR encryption, it's possible to 
change keys because the encryption is unauthenti- 
cated, a known security risk. For this reason, the 
Galois/Counter Mode (GCM) was created, which is 
just CTR with authentication via an extra authenti- 
cation data and tag. However, it’s possible to forge 
one of the blocks such that decryption will be valid 
for several keys, so GCM is vulnerable too. 

Secondly, more complex modes are exploitable 
too, such as OCB3 and GCM-SIV 7" These cipher 
modes work at the block level and not at the byte 
level, so you need to align payloads to the block 
boundary. They also require more than one block 
to compute the authentication collision, but that's 
a small overhead.?5 

It's even possible to set an arbitrary content in 
the authentication tag! 

Authenticated encryption isn't enough if the key 
isn’t committed to the encryption. It’s possible 
to craft ciphertexts that authentically decrypt with 
different keys, which is something that multiple 
schemes were independently found vulnerable to.?? 


Conclusion 


Near-polyglots are the starting point for funky 
polyglot-like with cryptography, whether for Ange- 
Cryption (ECB, CBC, CFB and OFB) or Timecryp- 
tion (CTR, OFB, GCM, OCB3 and GCM-SIV). 

Mixing near-polyglots (CTR, OFB) and forging 
contents to get the same authentication tag is pos- 
sible for GCM, OCB3 and GCM-SIV mode.?? 

Mitra's handling of near-polyglots makes it very 
easy to merge dozens of different file formats, and 
the key commitments tools forge the tags. Using 
these techniques and tools, exploiting authenticated 
collisions only requires a few command line invoca- 
tions! 


28Note that GCM-SIV's computation cost is relative to payload size, so try it with smaller files first! 
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by Mirco Stauble 
30unzip pocorgtfo22.pdf withoutcommit .pdf 
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unzip pocorgtfo22.pdf project MircoStauble.pdf % “Actually Good Encryption? Confusing Users by Changing Nonces" 


22:08 The Электроника MK-51 is a Casio fx-2500 


Howdy y'all, 

In the USSR, there was a calculator called the 
Электроника MK-51.?! It looks an awful lot like 
a calculator from Japan, the Casio fx-2500. In this 
short article, ГП demonstrate that in addition to 
looking similar, the MK-51 is in fact a clone of the 
fx-2500, and that the Soviets went so far as to coun- 
terfeit the NEC microcontroller inside the fx-2500 
with just minor alterations. Every last bit of the 
program ROMs are identical. 

Let's begin with a little background. Introduced 
in 1978, the fx-2500 is a small scientific calculator 
that's held in a plastic wallet with its instruction 
manuals. The MK-51 joined it in 1982; manufac- 
tured until 2000, it also features a plastic wallet 
and instruction manuals. 'They both have an 8-digit 
LCD. The Casio is 121x67x11 mm, while the Elek- 
tronika is 130x71x8 mm. The keyboard layouts are 
a little different, some keys in different positions. 

As I suffer from a disease late at night that in- 
volves alcohol and Ebay, it wasn't long before a few 
units of the Elektronika arrived from Ukraine. The 
Casio was surprisingly a little harder to find, but I 
found that one, too. 


e |=. 


31Blektronika MK-51, if you don't follow Cyrillic. 


by Travis Goodspeed 


I first tore down the MK-51. The zebra strip on 
this unit’s LCD had long since turned to stone, but 
otherwise it seemed in decent condition for its age. 
The calculator is built around a single microcon- 
troller in an epoxy blob package that rides within the 
plane of the PCB, a trick that I’ve also seen for re- 
ducing thickness on the HP-28 calculator. The blob 
in my calculator had no part number, both other 
sources online say that this chip is the К757ИП1-2. 
In X-ray, you can clearly see the hole cut out of the 
CPU to allow the CPU to fit with less thickness. 

Going a little deeper, I desoldered the microcon- 
troller, cut its legs off, and dropped it in a hot bath 
of HNO3. The epoxy blob was torn off by nitric acid 
just fine, but it softened first in a way that Western 
QFP and DIP packages never do. I wonder what 
it's made of, but ГП leave that to professional re- 
verse engineers of Soviet plastics rather than guess. 

A full die photograph is shown in Figure 7. 
Nearly a third of the die is consumed by a diffusion 
ROM, and all pins are numbered, which was quite 
nice of the designers and convenient for reverse en- 
gineers. 

Bits in a diffusion ROM are rarely surface visible, 
so I had to delayer this chip in dilute hydrofluoric 
acid. Ten minutes in hot Rust-Go did the job on the 
very first try. Bit rows are found in groups of four, 
with plenty of spacing between groups, but Mask- 
ROMTool — presented page 5 — was able to make 
short work of recognizing them. 

One important trick with diffusion ROMs is that 
after delayering, they sometimes have no unique 
color, just a border line. So in addition to mark- 
ing the center of each bit, I had to instruct my de- 
coder to sample a wide stretch of pixels, recording 
the darkest color in each channel. This made the 
bits pop out, just a few dozen decoding errors. 

'The ROM itself is 352 columns wide and 64 rows 
tall, holding 22,528 bits or a little more than two 
kilobytes in total. While some very clever souls 
have decoded ROMs without knowing the architec- 
ture and instruction set, I wasn't very hopeful of 
doing the same. Who the hell knows what 4-bit mi- 
crocontrollers were Ivan's favorite in the eighties? 

So by this stage, I had die photographs and an 
export of the physically ordered ROM bits, but not 


MY CASIO ALMOST GOT ME 
BUSTED AT THE BORDER. 


When the customs agent at the border pulled the Casio M-811 


out of my suitcase, he thought I was some kind of secret шс 


Because the Casio M-811 is the 
smallest calculator in the world, and 
looks like something out of the spy 
movies. 

Wafer-thin, the size of a match- 
book cover, it adds, subtracts, multi- 
plies, divides, does percentages, and 
has a memory for chain operations. 

Icouldn't make the customs agent 
believe it wasn't some kind of elec- 
tronic bugging device. "It's just a 
great little calculator; I said. "Lots 
of people have them in the States. 
Only costs $29.95" 

"You lie; he replied. 

It took me an hour to convince 
him I wasn't Mata Hari. 

Now, I'm not going to name the 
country involved (I don't want to 
start a war or anything) but if they're 
going to give everyone with an M-811 
such a rough time, they're not going 
to have any tourist business left. 

And besides, if I was a spy, I 
wouldn't be that obvious. 


World's smallest calculator 


МІСАОСААОМ-811 


actual size 


AT CASIO, MIRACLES NEVER CEASE. 


Casio, Inc. Consumer Products Division, Executive Offices: 15 Gardner Road, Fairfield, М.). 07006 New Jersey (201) 575-7400, Los Angeles (213) 923-4564 
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Figure 6: NEC D897G Die from the Casio fx-2500, Delayered 
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Figure 7: К757ИП1-2 from an Электроника МК-51, Top Metal 
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Figure 8: Mask Labels from Электроника MK-51 (top) and NEC D897G (bottom) 
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yet a decoding into logical bytes or error correction 
of the marked bits. To get further, our story leaves 
the Soviet Union and moves next door, to Japan. 

My sample of the Casio fx-2500 arrived a week 
after the Elektronika. From the very first glance, 
it's clear that the exterior casing of one was mod- 
eled after the other. Minor differences in size and 
plastic quality were visible, and some of the button 
rows are in a different order. 

Inside the case, the Casio also uses a single mi- 
crocontroller, but this is attached to the PCB as a 
normal QFP package, rather than sitting in a cutout 
of the PCB. It has a visible part number of NEC 
D897G. 

I decapped and then delayered it in the same 
way as I did the Soviet chip, then delayered it to 
get Figure 6. Both chips have pin numbers around 
the perimeter; both chips have layer labels between 
pins 49 and 50; and, both chips have a diffusion 
ROM that’s roughly a third of the surface area. 

The mask ROM of the NEC chip also contains 
groups of four rows, and it also contains exactly 
22,528 bits in exactly 64 rows and 352 columns. 
The bits are identical, and I was able to correct the 
few dozen bit errors that I made in the Elektronika 
extraction by having MaskROMTool flag all differ- 
ences with the Casio as errors. This shows that not 
only are these two calculators running the same ar- 
chitecture, but they have exactly the same firmware. 

Unlike the Электроника calculator, which lacks 
a part number on the die, the NEC chip is labeled 
with D897G. This part number isn't in my databook 
collection, but the naming convention fits members 
of NEC's uCOMA3 family that appeared in 1977, 
just a year before the calculator. 

I grabbed Computer Gin and Fabulous Fred, 
uCOM43 chips which had been decoded by Sean 
Riddle. Between those and my handy NEC data- 
book, I found that Fabulous Fred begins with 8f, 
91. 8f is an LDZ instruction, loading Of into the 
data pointer. 91 is an LI instruction, loading O1 
into the accumulator. Computer Gin begins with 
15, 3e. 15 is ADC instruction, and add with carry. 
3e is an XI instruction, 

Plugging these instruction pairs into MaskRom- 
Tool’s byte solver, it recommends a solution similar 
to the Riddles ROMs.?? With that decoding, the 
ROM begins to look something like uCOMA3 ma- 
chine language! 


0:00: db JCP 0х1В Call а function 
0:01: 15 Ба ІШІ 0х5р Load the DP 
0:03: 63 RPB 3 Clear I/O pins 
0:04: 3d XMI 1 

0:05: 66 REB 2 

0:06: 28 X Move DP to A 
0:07: 69 RMB 1 Clear bit of RAM 
0:08: 01 DI Disable Ints 


Today I’ve not only shown that the Электро- 
ника MK-51 and the Casio fx-2500 not only look 
and behave similarly, but also that the MK-51 has a 
clone of the NEC microcontroller from the fx-2500, 
that the firmware is identical between the two calcu- 
lators, and that the firmware is similar if not identi- 
cal to that of the NEC #COM43 architecture. With 
time this ROM dump might be built into an emu- 
lator, though we will have to figure out after which 
calculator to name the MAME module! 


электроника 


МК 51 


POL, 
PAJEN UPAR OA 
л) 


325atorom bits.txt -o rom.bin --decode-cols-downl-swap -i -r 90 
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The Broadmoor 4-door Station Wagon—luxury interiors, powerful engine, 
rugged frame—Studebaker builds both beauty and utility into station wagons. The 
best ride, too, because only Studebaker station wagons have double-teamed springing 
—loaded or empty, they ride like fine sedans. Craftsmanship makes the big difference! 
Try the Broadmoor at your dealer's, today! 


Studebaker-Packard 


CORPORATION 


Where quite бато comes fast! 


(7) See Your Neighborhood Dealer Today 
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22:09 А Tourist's Guide to Reversing Renesas М16С 


and the R8C, too! 


Ehilà, vicino! 

Welcome to another installment of our series of 
quick-start guides for reverse engineering embedded 
systems. Our goal here is to get you situated with 
the architecture of smaller devices as quickly as pos- 
sible, with a minimum of fuss and formality. 

Those of you who have already worked with Re- 
nesas M16C or similar architectures might find it to 
be a useful refresher, while those of you new to the 
architecture will find that it really isn't as strange as 
you've been led to believe. If you've already reverse 
engineered binaries for any platform, even C-SKY 
CK803, you’ll soon feel right at home. 

We've written this guide using a device in the 
R8C/Tiny series for specific examples, but with mi- 
nor differences it applies well enough to the R8C 
and M16C families as a whole. For larger Rene- 
sas parts, such as those used in engine control units 
and portable amateur radios, you might be better 
served by a different introduction. Either way, be 
sure to keep reading for a case study on applying 
power analysis and fault injection techniques to suc- 
cessfully recover firmware from an R8C/Tiny target 
with protected flash memory. 


Architecture 
Von Neumann 
16-bit words 


Registers 
RO-R3: Data Registers (RO and R1 as split 8- 
bit halves.) 
А0-А1: Address Registers (A0 and Al as com- 
bined 32-bit АТАО register.) 
FB: Frame Base 
PC: 20-bit Program Counter 
INTB: Interrupt Table (available as split 4-bit 
INTBH and 16-bit INTBL registers.) 
USP: 16-bit User Stack Pointer 
ISP: 16-bit Interrupt Stack Pointer 
SB: 16-bit Static Base 
FLG: 11-bit Flag register 


Instructions 
89 instructions, where instruction encoding is 
variable width 
Opcode is 8-bit for the most frequently used 
Opcode is 16-bit for the others 
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by Christopher Hewitt and Niccolo Izzo 


Instruction Set Basics 


The first generation of R8C devices appeared in 2003 
and were marketed as a cost-reduced alternative to 
Mitsubishi’s M16C family, following the formation 
of Renesas as a joint venture between the semicon- 
ductor operations of Mitsubishi and Hitachi. The 
R8C family features the same 16-bit CISC architec- 
ture as M16C with binary level compatibility and 
the internal data bus reduced to 8 bits. These fam- 
ilies are also compatible at the assembly level with 
the M32C family. 

The instruction set is composed of 89 discrete 
instructions with many common instructions only 
requiring a single clock cycle. Instruction encoding 
has variable length and the opcode (8-bit for the 
most frequently used and 16-bit for the others) is 
followed by source and destination operands speci- 
fied through different addressing modes. 

Instruction mnemonics can be suffixed to prior- 
itize one of the following four possible instruction 
formats with the assembler choosing an optimal for- 
mat if one is not explicitly specified. 
Generic (:G) Op-code (2 bytes), source 
bytes), destination (0-3 bytes) 


Quick (:Q) Op-code with immediate data 
bytes), destination (0-2 bytes) 

Short (:S) Op-code (1 byte), source (0-2 bytes), 
destination (0-2 bytes) 

Zero (:Z) Op-code (1 byte), 
bytes) 


(0-3 


(2 


destination (0-2 


There are also numerous addressing modes cat- 
egorized across three different types. 


General Instruction Addressing Immediate, 
register direct, absolute, address register indi- 
rect, address register relative, SB relative, FB 
relative, SP relative 


Special Instruction Addressing 20-bit abso- 
lute, address register relative with 20-bit dis- 
placement, 32-bit address register indirect, 
32-bit register direct, control register (PC, 
INTB, USP, ISP, FLG) direct, PC relative 


Bit Instruction Addressing Register direct, ab- 
solute, address register indirect, address reg- 
ister relative, SB relative, FB relative, FLG 
direct 


Registers and Calling Convention 


Be aware of different calling conventions depending 
on the compiler and options used. For example, the 
IAR C and C++ compiler for R8C and M16C sup- 
ports a “normal” calling convention and a “simple” 
one. The normal (and default) calling convention is 
optimized to use registers as much as possible (with 
AO, RO, and R2 used as scratch registers), then de- 
fers to the stack for passing parameters. The simple 
calling convention, however, only passes the first pa- 
rameter through ROL, RO, or R2R0 (depending on 
size), then defers to the stack for remaining parame- 
ters. RO and R2R0 are also used for returning values 
from a function. 

There are also subtle differences between Rene- 
sas’ own compilers such as the NC30 compiler used 
for the M16C and К8С families, and the NC308 com- 
piler used for M32C and certain M16C family parts. 
For example, NC30 preserves registers during func- 
tion calls on the caller side, while NC308 does so on 
the called side.?? 

Regardless of the compiler used, stack frame ma- 
nipulation is evident by the presence of ENTER and 
EXITD instructions to build and deallocate stack 
frames respectively. 


Memory Map 


Note that different documents have conflicts. We 
used a Chinese language datasheet for our figure on 
page 41 where things differed.*4 

Also note that this articles PoC dumps actual 
code from a region that isn't supposed to be valid 
for this specific part number, but is valid for differ- 
ent catalog part numbers (likely sharing the same 
die). 
Editors Note: We have included a die photo on 
page 45 taken from processing a R5F21194, for any- 
one who wishes to perform future comparisons to 
other catalog part numbers. 


33'The documentation is confusing here, for further see 


Code Protection 


The Renesas R8C/Tiny series supports a couple 
of different mechanisms for flash protection. Se- 
rial programmer commands to access the flash, in- 
cluding erasing, are completely ignored if a cus- 
tom 7-byte ID code was interleaved with entries 
in the interrupt vector table at offsets OxOFFDF, 
OxOFFE3, OxOFFEB, OxOFFEF, OxOFFF3, OxOFFFT, 
and OxOFFFB. An ID code consisting of all-ones (such 
as when flash cells are unprogrammed from the fac- 
tory) is automatically unlocked by the boot ROM, 
while any other value requires manual unlocking 
with a successful ID code comparison to re-enable 
flash manipulation. Parallel programmer commands 
to access the flash are ignored through configura- 
tion of the Option Function Select (OFS) register 
located at offset OxOFFFF by setting ROMCP1-0 and 
ВОМСВ=1.35 


Fixed Interrupt Vector Table (Flash) 


OxOFFDC Undefined Instruction ТОТ (OxOFFDF) 
OxOFFEO Overflow ID2 (OxOFFE3) 
OxOFFE4 BRK Instruction 

OxOFFE8 Address Match ID3 (OxOFFEB) 
OxOFFEC Single Step 1Р4 (OxOFFEF) 
OxOFFFO Osc. stop, watchdog, VM2 ID5 (OxOFFF3) 
OxOFFF4 Address Break ID6 (OxOFFFT) 
OxOFFF8 Reserved ID7 (OxOFFFB) 
OxOFFFC Reset OFS (OxOFFFF) 


Finding a Target 


Renesas is one of the leading suppliers of microcon- 
trollers in the world but it's not very common to 
see their microcontrollers used by electronics hobby- 
ists in western countries. Mass-produced commer- 
cial designs spanning from inexpensive toys to fault- 
tolerant automotive engine control units are much 
more likely to include Renesas parts.?6 


One low-cost and readily accessible product con- 
taining an R8C/Tiny microcontroller is the SA868 
radio module with integrated power amplifier. Lim- 
itations in the module's factory firmware make it 
an attractive target for modifications, but this first 
requires unlocking access to the protected contents. 


unzip pocorgtfo22.pdf m32c90-compiler.pdf Page M-70 and unzip pocorgtfo22.pdf m32-compiler.pdf Page 59. 


34unzip pocorgtfo22.pdf г5г0с00сп.рағ 


35See unzip pocorgtfo22.pdf r8c-hardware.pdf section “ROM Code Protect Function” (Page 250) 
36See Bypassing the Renesas RH850/P1M-E read protection using fault injection by Willem Melching. 
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0x00000 


SFR 


Ox002FF 


Reserved 


0x00400 


Internal RAM 


*0x004FF 


Reserved 


0x02400 OxOFFDC 


Undefined instruction 1р1 (OxOFFDF) 


Internal ROM 


Overflow ID2 (OxOFFE3) 
(data) 
BRK instruction 
OxO2BFF 
Address match ID3 (OxOFFEB) 
Reserved 


Single step 104 (OxOFFEF) 


0х0Е000 


Oscillation stop, watchdog, voltage monitor 2 105 (OxOFFF3) 


Internal ROM Address break ID6 (OxOFFF7) 


(program) Reserved ID7 (OxOFFFB) 


Reset OFS (OxOFFFF) 


*OxOFFFF OxOFFFF 


Fixed Vector Table 


Expanded Area 


* Other catalog part numbers use different ranges for some sections. 


OxFFFFF 


Figure 9: Memory Map Summary 


designed for. While the module is strictly marketed 
for use in analog FM applications, the transceiver 
component is used in more sophisticated digital ra- 
dios using 4-FSK modulation. Once the microcon- 
troller's protected flash can be unlocked, it is possi- 
ble to dump and patch the firmware or even replace 
it with a custom purpose-built one to support more 
useful and interesting digital voice and data proto- 
cols.?" 


Under the SA868 v1.1’s metal shield sits a Re- 
nesas R5R0C002SN, an R8C/1B group compati- 
ble part that was only available to customers in 
Asia. A close approximation with a publicly avail- 
able English language datasheet is the Renesas 
R5F211B2SP. The role of this microcontroller in 
the module is to expose a Hayes-style command 
set interface to control an Auctus АТ18465 RF 
transceiver, which is the same part at the core of 
many low-cost amateur handheld radios including 
the ubiquitous UV-5R, GD-77, and MD-UV380. 


Without getting too deep into radio theory, the 
5.4868 module has a lot more potential than it was 


ARCTURUS 


Via Amedei М. 8 - MILANO - Via Amedei М. 8 


37See Delorie 2009 page 5, unzip pocorgtfo22.pdf renesasflash. pdf 
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Extracting the Application 


A previously published attack for a microcontroller 
in the M16C family described a successful tim- 
ing attack against the boot ROM.?* The authors 
demonstrated a measurable time delta between the 
last cycle of the serial programming clock (SCK) 
and an output pin (BUSY) asserted while ser- 
vicing programmer commands that could be used 
to iteratively determine individual bytes of an ID 
code. This approach might have been viable for the 
R5R0CO002SN since the R8C family is a close relative 
of the M16C family, if not for the lack of an equiv- 
alent pin indicating busy state. It is, however, still 
possible to demonstrate whether or not the timing 
attack is portable to this target by extracting the 
same information through power analysis of the ID 
code verification process. 

A relationship to power consumption can be 
measured by removing the microcontroller from cir- 
cuit and inserting a low value shunt resistor in- 
line with the power supply. Experimentation with 
added capacitance or changing shunt resistor posi- 
tion helps establish which conditions provide the 
cleanest measurements. During any unsuccessful 
unlock attempt, voltage measurements at the sup- 
ply pin expose seven evenly spaced segments, corre- 
sponding to each byte of the ID code. This obser- 
vation suggests that the К5КОС0025№% boot ROM 
executes comparisons in constant time and is not 
vulnerable to the same timing attack. Brute force 
attempts are also discouraged by silently ignoring 
unlock requests after a few unsuccessful attempts. 
On a target with a known ID code, leakage from 
successful unlock attempts suggests that valid com- 
parisons are performed twice, possibly to mitigate 
against power glitches. 
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Readers with experience in side channel analysis 
might be tempted to calculate Pearson correlation 
coefficients in order to match ID code attempts with 


power trace data in hopes of leaking bits or bytes 
from the real ID code, but the approach seemingly 
does not work here. Whether the result of high clock 
jitter, inadequate ADC resolution, or just bad luck, 
no apparent correlation between ID code attempts 
and resulting power trace can easily be identified. 


ID Code Unlock: Incorrect Key 
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Fault injection is another tool at our disposal for 
extracting protected flash memory contents. Long 
pioneered by satellite television enthusiasts explor- 
ing conditional-access modules, fault injection at- 
tacks traditionally manipulate clocks or supply volt- 
ages as a mechanism for introducing unintentional 
behavior to a system, such as causing instructions to 
be skipped or register contents to be modified. De- 
vices like the ChipWhisperer-Lite have made these 
kinds of practical attacks significantly more ap- 
proachable for hobbyists, but don't disregard the 
price point and flexibility of a simple microcontroller 
combined with a fast switching MOSFET to mo- 
mentarily bridge a supply voltage to ground poten- 
tial. 

Each microcontroller in the R8C/Tiny family 
has an integrated ring oscillator running at roughly 
8 MHz further divided by 32 for use as the system 
clock during boot ROM execution. This clock is not 
exposed externally, so clock glitching is not the most 
convenient approach. The high degree of jitter from 
this internal clock combined with the double check of 
the ID code observed in power analysis means that 
landing a voltage glitch twice during an unlock at- 
tempt, with the correct timing, might be excessively 
difficult. But let's consider for a moment if a suc- 
cessful ID code verification is even necessary prior 
to accessing flash programming commands. Maybe 
it's really only a formality intended for diligent en- 
gineers who rigidly follow the rules outlined in the 
hardware manual. 

It is clear from the location of the fixed inter- 
rupt vector table that the ID code is stored on the 
last page of flash and clear from the programming 
guide that there is a flash page read command in 
the boot ROM. It's not unreasonable to at least try 


38See Hacking Toshiba Laptops by Serge Bazanski and MichatKowalczyk. 
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repeatedly reading the final page of flash without 
any prior ID code verification while simultaneously 
sweeping glitches over the microcontroller’s power 
supply with varied time offsets. The programming 
interface’s serial transmit pin can even be used as 
a trigger to anchor glitches around the page read 
commands. 

Some experimentation is required to find glitch 
pulse lengths and time offsets that don’t stall the 
microcontroller yet still influence boot ROM be- 
havior. Keep in mind that variations in capaci- 


tance and even temperature can easily impact re- 
sults and repeatability. Since thousands of glitch 
attempts might be required for a single success, it’s 
best to keep each attempt as short as possible: Skip 
unnecessary communication steps by directly using 
the boot ROM's flash programming protocol and 
only perform hard resets when the microcontroller is 
completely stalled and not responding. With a little 
luck, our trusty microcontroller confidently returns 
a full page of flash data, rather than nothing.?? 


Glitching an unknown programmed R5ROCO02SN sample from AliExpress 


[*] bootrom: VER.1.20 
[*] injecting faults... 
«omitted» 


[?] dumped page - width: 37.890625 offset: -44.921875 ext offset: 5420 

0000FF00 ТІСТІ ff ff ТЕ ff ff ff If ff fI ЗЕ PP If T£ IP 2x4 а 
0000FF10 TI ff ff ft ff ff ff ff. £f ff fI fI. ТЕТЕ ЧЕ ffo 4x ee 8 
0000FF20 TT ff ff ff ff ff ff Tf ff ТЇ ff £f ЇЇ ff fÈ ff eta ease ee e ж» 
0000FF30 ff ff ff ff 53 fb 00 00 53 fb 00 00 53 fb OO 00 aoe EB P4 DEG AD E NE 
0000FF40 53 fb 00 00 53 fb 00 00 b4 fa 0000 53 fb 00000 5S...8....... os 
0000FF50 53 fb 00 00 53 fb 00 00 aa #8 00 00 53 fb 00 00 3S...8....... S. 
0000FF60 53 ҒЫ 0000 53 fb 00 00 53 fb 00 00 53 fb 00 00 58S...8...8...8 
0000FF70 5326.00 DO 1b- fb 00 00. ff ff ff ff ЖІ If ff ff Solos e y 
0000FF80 TX ff If IL ff PIS REIS Id. fE FEFE Tf ТЕЧЕТ ызла нао АЦА 
0000FF90 ff Tf f$ ФЕ ЗЕЕ II ff ТЕ ff ff ff ff XE Tf ff i werese y бын ox 
0000FFAO0 ff £f ff ££: ТЇ ТЕ II ff ORE ff Sf ff ff If ff ff sas ex do) аса 
0000FFBO ff ff ff. ТЕ ff ff TE ЕО TE ff ЧЕ ЖЖ ЖЕЛІ TR ДЕ ЕЗ oe eee kas ele ad ee 
0000FFCO ТЕТЕ ffo ff ff -fE EE) ТЕ ЖЕ II Off ЖЕ 5104-5105 quon ew exa 
0000FFDO ff ff ff ff ff ff ff ff ff ff ff ff 53 fb 00 4е ............ S..N 
0000ЕҒЕО 53 fb 00 ба 53 fb 00 00 53 fb 00 46 53 fb 00 74 58S..jS...S8..FS..t 
0000ҒЕҒО БО fb 00 53 53 fb 00 59 53 fb 00 54 49 eO 00 #7 P..SS..YS..TI 
[!] valid idcode - 4e6a4674535954 

[*] dumping entire flash... 

[*] block 0 (0х0е000 - OxOffff): 

0000 E000 7b 60 бе Тс 65 3d ЗЕ 70 7f 74 77 2f 1b бе 1f 17 (<%<7іе-?р.Мм/.п. 
0000E010 f4 85 Of #4 92 Of #4 ba Of #4 dd Of #4 е1 Of #4 ................ 
0000 E020 02 10 f4 ес Of f4 Of 10 #4 2a 10 #4 56 10 #4 43 ......... *..V..C 
«omitted»? 

[x] block 1 (0х0с000 - OxOdfff): 

0000C000 ff ff ff d£ OE Rf II ff ЕЁ ff ff ff ff ff ff Ti. ааа 
0000C010 ff PEL EE Ef ff FEES ФСО PE EE 25-24. fE FE EE PE Gur ex $C. 
0000C020 ЖЕ Tf fE SI If PE ТЕЗ 22-22 ШЕЕ Go eek See ee 
«omitted»? 

[x] block a (0x02400 - Ox027ff): 

00002400 TI ff. ff ff ТЕ. ff ff. f£ f If ff ff ТЕТЕ ода 
00002410 TT ff ff Tf ff ff ff Е ff ff ff ТЇ ff ff Ff ff x RE we 
00002420 Tf ff ff ff ff ff ff ff ff ТЇ ff ff ff ff Ff ff wa vcs eae d a 
<omitted> 

[*] block b (0x02800 - 0x02bff): 

00002800 ІТ ff IT ff ТЕ ТТ ТІ ff ff ff ЧЕ ff- ТЕ ff ff ТЇ сарая 
00002810 ТІ ff If If ££ Lf ТЕ £f. ТЇ ff ff ff ff ТҮЗІ £f „зек жу X ppsa 
00002820 ff fI Tf ff ЧЕ II EL. IL TIR fL IT ££. EE Ff ff ffo oue Rao IR 
«omitted»? 

[*] done 


39See the Jupyter notebook unzip pocorgtfo22.pdf r8c-glitch.ipynb 
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R8C/Tiny BootROM ID Code Bypass — Greetz to ShmooCon Møøse Crew 
= =z 
as 


Successful glitches don’t always return meaning- 
ful data, but ID codes can be assembled from their 
expected offsets in the page, then verified through an 
unlock attempt. Eventually you'll find a match and, 
once successfully unlocked, the entire flash memory 
can be dumped or erased and reprogrammed. If 
it looks like some data is missing, try reading ad- 
ditional flash pages that aren’t officially supported 
by the part number since there’s a good chance the 
same internal die is used by several part numbers.?? 


Proceeding with Analysis 


Once a firmware image is safely recovered, you'll al- 
most certainly want to inspect how it works. M16C 
isn't as esoteric as it might seem and there are ac- 
tually a few different options for analysis. IDA Pro 
provides a disassembler for the architecture and Bi- 
nary Ninja has some support by way of a third-party 
plugin.*! If you're averse to commercial software 
don't forget about GNU Binutils which supports 


M16C and R8C through the m32c-elf target. 
;; Recovered firmware through m32-elf- 
objdump 

0000fcca «.data*0xd8ca»: 
fcca: eb 40 32 06 ldc #1586,isp 
fcce: c7 02 Оа 00 mov.b:s %2,0ха 
fcd2: b7 04 00 mov.b:z %0,0х4 
fcd5: b7 Oa 00 mov.b:z %0,0ха 
іс48: ер 30 80 00 ldc #128,flg 
fcdc: eb 50 b2 05 ldc #1458,sp 
fce0: eb 60 00 04 ldc #1024,sb 
fce4: eb 20 00 00 ldc #0,intbh 
fce8: eb 10 dc fe ldc #-292,intbl 
fcec: fd 64 fc 00 jsr.a Oxfc64 
fcf0: 75 cf ba 04 mov.w:g #1586 ,0x4ba 
fcf4: 32 06 
fcf6: 75 cf be 04 mov.w:g #128,0x4bc 
fcfa: 80 00 
fcfc: d9 Of be 04 mov.w:q #0,0x4be 
fd00: fd a2 fa 00 jsr.a Oxfaa2 
fd04: eb 70 00 00 ldc #0,fb 
fd08: fd Та #5 00 jsr.a Oxf57a 
fd0c: #5 03 00 jsr.w Oxfd10 
fd0f: fb reit 
fdi10: 49 10 mov.w:q #1,r0 
#412: бе fd jne Oxfd10 
fdi4: £3 rts 


35 


37 


Alternatively, a Ghidra third party plugin cre- 
ated recently is capable of disassembling most in- 
structions and may help jumpstart new reverse en- 
gineering projects through integration with Ghidra’s 
processor independent decompiler.4? 


; after reset, this program will start 
ldc #((topof istack)+(sizeof istack)), 
isp ;set istack pointer 
mov.b #02h,0ah 
mov.b #00h,04h ;set processor mode 
mov.b #00h,0ah 
.if __STACKSIZE__ != 0 
ldc #0080Ъ, #15 
; set stack pointer 
ldc #((topof stack)*(sizeof stack)),sp 
.else 
ldc $0000n,flg 
.endif 
ldc * | SB. ,sb ;set sb register 
; If the destination is INTBL or INTBH, 
; make sure that bytes are sent in order 
ldc #((topof vector) >>16) &£OFFFFh , INTBH 
ldc #(topof vector) &OFFFFh,INTBL 


<omitted> 


#0h,fb; for debugger 

; Remove the comment when you use 

3 global class object 

; Sections C$INIT will be generated 
;  .glb САТ INIT 

; .call --CALL INIT,G 

> jsr.a __CALL_INIT 


-glb 
.call 
jsr.a 


.main 
.main,G 
.main 


Whichever option you pick, be sure to identify 
the correct entrypoint for the binary by referencing 
the reset vector in the fixed interrupt vector table 
at the very end of flash memory. 


40See The Secret of R8C/M11A and M12A at the RVF/RC45 blog. 


4lgit clone https://github.com/whitequark/binja-m16c 
42git clone https://github.com/silverchris/mi6c 
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Die photograph by Travis Goodspeed 


43See PS4 Aux Нат 2: Syscon at FailOverflow. 
44515 clone https://github.com/ÜpenRTX/sa8x8-fw.git 
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An Unexpected Outcome 


News of the effort to repurpose the SA868 with cus- 
tom firmware eventually found its way to the com- 
pany producing the radio modules, NiceRF Wireless 
Technology. An amateur radio enthusiast in China, 
Amo Xu, made a compelling case for the company 
to release an intentionally user-programmable vari- 
ant of the module. Shortly after their discussion, 
the company began offering the SA868S Open Edi- 
tion module. This variant is erased after quality 
assurance, guaranteeing the module is unlocked for 
reprogramming. 

The new SA8688, version 2.0, is notably different 
from previous versions in that the microcontroller 
has been replaced with one from a different Renesas 
microcontroller architecture family, RL78, which is 
not vulnerable to the attack presented in this arti- 
cle. The RL78 family has, however, been explored 
in some detail in the context of the PlayStation 4 
gaming console and several useful tools already exist 
for working with that platform, including an imple- 
mentation of the debugging protocol and third party 
plugins for IDA Pro and Ghidra.*? 

While not as common as it should be, hardware 
reverse engineering occasionally leads to mutually 
beneficial outcomes with a manufacturer. A deep 
dive into an unfamiliar microcontroller architecture 
to improve a product’s capabilities led to a manufac- 
turer removing obstacles for experimentation. The 
availability of the SA868S Open Edition paves the 
way for user customizable firmware and has already 
motivated the creation of a free and open source al- 
ternative firmware granting complete control of all 
registers in the underlying transceiver chipset, en- 
abling use of digital protocols such as APRS and 
M17.4 

We hope that you've enjoyed this little guide to 
Renesas M16C and R8C, and that you’ll keep it 
handy when reverse engineering firmware from those 
platforms. 


22:10 А Tourist’s Guide to Эльбрус 


In the tradition of the many high quality tourist 
guides that have appeared in this fine publication, 
let's take a magical tour around Russia's modern 
computer architecture, the Elbrus 2000.45 46 47 


At A Glance 


Common Models 
Elbrus-1$+, Elbrus-4S, Elbrus-8S, Elbrus- 
8SV, Elbrus-16S 


Architecture 
Von Neumann 
Very Long Instruction Word 
Register Windowing (32-bit Base Registers) 


Registers 
g0-g31: Global Registers 
г0-г17: General Purpose (Windowed) 
b0-b7: Overlay Register within Window 
Pred0-Pred31: Boolean Predicate Registers 


Address Space 
64-bit Virtual Addressing 
Unknown Physical Memory Map 


by evm 
Background & History 


Elbrus is a Russian CPU architecture that has been 
around in some form for over 40 years. It started at 
Lebedev Institute of Precision Mechanics and Com- 
puter Engineering. It was the first superscalar, out- 
of-order execution processor developed in the Soviet 
Union (when the Elbrus 1 debuted in 1979). The ar- 
chitecture was extended to be a very long instruction 
word (VLIW) architecture with Elbrus 3 in 1990. 
Once fully integrated as a microprocessor architec- 
ture in 2001 (previous versions had used many dis- 
crete chips), the architecture became known as El- 
brus 2000, or E2K for short. Elbrus is designed in 
Russia but currently manufactured by TSMC in Tai- 
wan because of a lack of Russian production facilities 
capable of producing chips at advanced technology 
пойев,48 


In the early 7906, the Lebedev Institute spun 
off a joint stock company called the Moscow Cen- 
ter of SPARC Technologies (now shortened to just 
MCST). MCST currently produces new Elbrus chips 
and Elbrus-based PCs, laptops, and servers. Elbrus- 
8S and 8SV are the current top-of-the-line proces- 
sor models (eight core versions for servers and desk- 
tops), and a lower-cost 1S-- (single core) is available 
as well. Note the transliteration from Cyrillic where 
the model names appear as Эльбрус-8С, Эльбрус- 
8CB, and Эльбрус-1С--, respectively. Anecdotally, 
the 8S CPUs are about three times slower than a 
comparable Intel CPU,*® but the draw of Elbrus 
is that it’s a fully domestically designed Russian 
processor. The Russian military has reportedly or- 
dered thousands of ruggedized laptops based on the 
Elbrus-1S+,°° although there is no indication that 
the order was ever delivered. 

There is currently very little public documenta- 
tion on Elbrus because MCST controls most doc- 
umentation under nondisclosure agreements. This 
means we don’t have full processor documentation 
like we normally would for a commercial CPU. 


45Travis Goodspeed and Ryan Speers, “A Tourist’s Phrasebook for Reversing Embedded ARM in the Dialect of the Cortex 


M Series,” PoC||GTFO 11:6 


46Ryan Speers and Travis Goodspeed, “A Tourist’s Phrasebook for Reversing MSP430,” PoC||GTFO 11:08 

47 Chris Hewitt, “A Tourist’s Guide to Altera NIOS,” PoC||GTFO 21:7 

481an Cutress, "Russia's Elbrus 8CB Microarchitecture: 8-соге VLIW on TSMC 28nm,” AnandTech, June 1, 2020. 

49 Anton Shilov, “Russian-Made Elbrus CPUs Fail Trials, ‘A Completely Unacceptable Platform’,” Tom's Hardware, December 


24, 2021. 


50Inna Sidorkova, “Цены на военные ноутбуки достигли Эльбруса.” July 9, 2018, RBC. 


ENGINEERS AND SCIENTISTS 


Phoenix, HM-99,Syncom,Colidar 
Laser Range Finders, ADO - 51, 
Surveyor Maverick, Early Bir'd, 
TOW, Polaris Guidance Systems, 
AIM-47, Advanced Technology 
Satellites, VATE and more... 


They're all at Hughes. 


(Should you be there, too?) 


Send your resume to: Mr. Robert A. Martin/Head of Employment/Hughes Aerospace Divisions/11940 West 
Jefferson Boulevard/Culver City 80, California. ‘An equal: opportunity: employer: 

| 
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Requirements: Accredited Engineering degree and a minimum of three | 
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years of applicable professional experience. U. S. Citizenship. 1 | 
We promise you a reply within one week. T MRRORRAGE она. 
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We used three sources of information for this ar- 
ticle: (1) a Russian guide to Elbrus programming 
and optimization published by MCST;?! (2) source 
code published by the OpenE2K group (a hobby- 
ist group seemingly unrelated to MCST), and (3) 
leaked Linux kernel source code. 

MCST is currently on the US sanctions list but 
thanks to the Reverend and friends we had access to 
an Elbrus-1$+ machine and used it to play around 
with some code examples. Our Elbrus was running 
a, version of Linux made by MCST, but other Rus- 
sian Linux distros are also available for Elbrus (e.g., 
Astra Linux). The Elbrus machine has a compiler 
called lec, which is ће MCST compiler based оп gcc. 
It produces standard Linux ELF binary files. The 
options for disassembly at the moment are limited 
to ldis, which is part of lcc, and objdump, which is 
part of the binutils package put out by the OpenE2K 
group. ldis produces cleaner output, including reso- 
lution of symbol names, while objdump has a debug 
flag in the build that will prefix the output with the 
decoded instructions in hex. Anecdotally, ldis seems 
to miss some things (e.g., not disassemble all func- 
tions), although that could be due to operator error. 

In order to explore the Elbrus instruction set we 
updated rix's Smashing C++ VPTRs from Phrack 
56:8. That is a whole story for a another day, but 
you will find my code examples and the correspond- 
ing Elbrus disassembly attached to this PDF.°? 


Basics of Instruction Set Decoding 


The first thing we needed to figure out was how 
the instruction format works since the official docu- 
mentation left this topic out entirely. Fortunately 
we found that the OpenE2K binutils release has 
a preprocessor flag ENABLE. E2K ENCODINGS, which 
causes objdump to print out the instruction bytes 
and their groupings.?? A version of objdump with 
this flag was what we used to produce the disassem- 
bly for most of this article. 

In Elbrus documentation, the VLIW is called 
a “wide command" (широкой командой). A wide 
command contains multiple instructions, each of 
which is targeted at individual execution units in 
the CPU pipeline. The documentation variously 
uses the terms “commands” (команд), “instructions” 
(инструкций), and "operations" (операций) for the 


5lunzip pocorgtfo22.pdf elbrusprog.pdf 


component instructions within the instruction word. 

The OpenE2K objdump code refers to the way 
these component instructions are encoded as “sylla- 
bles." A nice feature of Elbrus is that the instruction 
encoding is fairly simple when compared against 
modern DSP architectures we've experienced. In- 
struction counting is an exploitation task that can 
be pretty complicated on some architectures, but 
not Elbrus. It’s fairly simple to determine the 
length of an instruction from the initial ^HS" syl- 
lable (shown on page 49). 

The HS syllable determines the presence of the 
other instruction syllables, which appear in a partic- 
ular order. The order is: 5S; ALU; CS0; ALES half 
syllables 2 and 5; CS1; ALES half syllables 0, 1, 3, 
and 4; AAS half syllables; a gap check; CDS; PLS; 
and finally LTS (literals). Literal syllables (i.e., im- 
mediate values) occur at the end of the syllables. 
The OpenE2K objdump code looks for all of the 
syllable presence flags above, reads them in order 
(minding the possible gap), and then compares the 
number of syllables read against the size field in HS. 
Any extra syllables are read as literals. For syllables 
that contain “half syllables" (1.е., 16-bit values), the 
order of the syllables is flipped as they appear se- 
quentially in memory. 


Byte order І0|111213141|5|61Т1 


Half syllable order | 1 | 0 | 3 | 2 | 


This makes more sense if you think about the 
bytes being read in as 4-byte little-endian values. 


Word order | 0 | 1 | 


Byte order 


Half syllable order | 0 | 1 | 2 | 3 | 


Register Set 


Elbrus's basic registers consist of 18 general-purpose 
registers (r0—r17), 32 global registers (60-631), and 
a sliding set of windowed registers (Ъ0-Ъ7). More 
will be explained about the register windowing in 
the next section. Registers are prefixed with an ac- 
cess width, similar to x86. 

For example, sr0 is single (32-bit) го, and аго is 
double (64-bit) r0, which is also the default. When 
the registers are used with floating point values, 


Murad Neumann-zadeh and Sergei Korolev, “Руководство по эффективному программированию Ha платформе «Эльбрус»” 


52unzip pocorgtfo22.pdf vptrs.zip 


53git clone https://git.mentality.rip/OpenE2K/binutils-gdb. git 


1311301|29|281|271|26|25|24|231221211201191181171161151141|13112111110| 91 8| 71 6l 5| 41 3| 21 11 Ol 
| ALU | ALES | PLS | CDS | CS | cl | LI NOP | S | Е 

L = loop mode 

NOP = number of cycles to delay (max 7) 

S - size of instruction word (add 1 to get number of 8-byte fields in the instruction word) 

C = presence of SS (control transfer) syllable 

ALU = presence of ALU syllables (6 possible) 

CS = presence of control syllables (call/return, windowing instructions) (CSO = 14, CS1 = 15) 

F = size of F1 fragment (add 1 to get number of 4-byte fields in the fragment) 

ALES = presence of ALES syllables (5 possible) 


CDS = presence of CDS syllables 
PLS = presence of CLS syllables 


Encoding of the initial HS syllable, which determines the presence of other syllables, in an Elbrus-wide 
instruction word. It is unclear what the ALES, CDS, and PLS syllables are used for as we did not generate 


any of those instructions in the example code. 


there is an xr0, which is an 80-bit version, presum- 
ably using the long double format from x86. Accord- 
ing to the documentation, two double registers can 
be accessed as a quad register — for example qr [i] 
where [i] is even - but gdb on our box doesn't seem 
to be aware of this notation. 

CAUTION: Elbrus has a word size of 32-bits for 
both registers and memory accesses, so the notion 
of single/double/quad on Elbrus is double what you 
might be used to on 64-bit x86, where the length of 
a, word dates back to its early ancestors. 


М 
2) vil Wen il камен teet cott on 


Laminated Bus Bars For Noise Reduction 


Flat bus conductors laminated with Eldre's thin, rugged insulation will reduce electrical 
noises which cause havoc in high speed, solid state equipment. Lower the inductance and 
control the capacitance of your vital power distribution lines. Ground shields are inter- 
leaved with the voltage-carrying conductors so that effective shielding can be adequately 
provided. The terminations of each conductor, as shown, are for soldering but other types 
can be incorporated into the bus design. This compact and completely molded bus can 
replace a bulky harness and repetitive wiring. 

Increase the reliability of your circuit with a bus system and obtain efficiency. 


4 
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Basic Arithmetic, and Memory Oper- 
ations in Elbrus 


Here we show the various ALU register operations: 


Integer Arithmetic Instructions 
add Addition 
sub Subtraction 
rsub Reverse subtraction 
umul/smul | (Un)signed integer multiplaction 
udiv/sdiv (Un)signed integer division 
umod/smod | (Un)signed modulo 
sxt Sign Extend 
Bitwise Operations 
and/andn Boolean and/nand 
or/orn Boolean or/nor 
xor/xorn Boolean xor/xnor 
shl/shr Shift left/right 
scl/scr Shift cyclic 
sar Shift right arithmetic (signed) 
insf/getf Set/get bitfield 
Floating Point Arithmetic Instructions 
fadd Floating point addition 
fsub Floating point subtraction 
frsub Floating point rev. subtraction 
fmax/fmin Floating point maximum/minimum 
fmul/fscale FP mult. / mult. by power of 2 
fdiv/frcp Floating point division/reciprocal 
fsqrt Floating point square root 


A basic ALU instruction looks like this: 


ALSO 11814484 addd,O агі, _f16s,_ltsOhi OxfffO, %dr13 
This translates to “add double precision, using 
channel 0, the 64-bit register %dr1 to the signed 16- 
bit value Oxfff0, and place the result in dr13.” There 
are six ALU channels, so you can do up to six ALU 
instructions in one wide instruction. There is no 
simple register “move,” so the compiler tends to use 


a zero-add as a “move” instruction. The full list of 


ALU register operations is shown in the table above. 
Notice that this is a fairly small number of opera- 
tions. Outside of the VLIW construct, the Elbrus 
instruction set feels pretty RISC-like. 

Memory operations are also pretty simple. Op- 
erations are load and store with a variety of width 
specifiers. Addresses can be a register plus an im- 
mediate offset, or the sum of two registers. Here is 
an example of a basic load operation: 


ALSO 6784ас08с 144,0 %4г13, 0x0, 74г12 


141$ renders this (a bit more clearly) as: 
19а,0 [ %dri3 + 0х0 1, 74г12 


This translates as “load double word (64-bits) from 
memory, using channel 0, from the address dr13 + 
0, and store in register dr12.” There are also ar- 
ray memory load/store operations (1daa/staa) that 
work similarly. As far as we can tell from the docu- 
mentation, the array mode doesn't add any special 
addressing. It’s still the sum of two registers or a 
register plus a constant; the main advantage is that 
there's a built-in post-increment operation. 


Register Windowing 


Probably the simplest way to understand register 
windowing is that it functions similarly to local vari- 
ables within a stack frame in a memory stack. In 
processors without windowing (which is nearly all 
processor families with some notable exceptions, like 
SPARC and Itanium), we are used to code transfer- 
ring registers around between function calls, mean- 
ing that some registers need to be saved on the mem- 
ory stack or transferred to nonclobbered registers 
(those guaranteed by the application-binary inter- 
face to not get modified by the called function) prior 
to a function call. 

A function of reasonable complexity will save 
registers it's not supposed to clobber to the memory 
stack so that they are available for calculations and 
then restore the previous values from the stack at 
the end of the function. Register windowing aims to 
reduce some of this register bucket-brigading over- 
head by making the register set function more like 
a memory stack. In the same way that a function 
allocates a stack frame for itself, a function allocates 
a window of registers. 

On Elbrus, this is accomplished in a function 
prologue with the setwd instruction. After setwd 
executes, the “register” rO is really a reference to the 
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first item in the register window. Now the function 
can use г0 to r<N> without having to save any reg- 
isters from the calling function. How about parame- 
ter passing in registers? Just like architectures with 
stack-passed parameters, we need a calling function 
and the called function to share an overlapping area. 
This is done with a wbs parameter in the са11 in- 
struction. wbs indicates the start of shared function 
parameters within the current window. After a call, 
rO in the called function now refers to the base of 
the shared parameter area. This is illustrated here, 
where a caller function has a window of size N and 
calls a function that allocates a window of size K: 


Caller 

аго dr(N-1) 

<- wbs -> Parameter area 
аго dr(K-1) 
Callee 


Elbrus also offers a sliding or mobile base reg- 
ister (b), which a function can use within its own 
function window. The base register is just an over- 
lay on the existing register window; it points to a 
given register within the window. Accesses to regis- 
ters with the base register use an array notation—for 
example, db[0] means "access first double register 
(64-bits) at the base pointer." The instruction set- 
bn is used to set this pointer. 

The operand rbs (the offset to set b from the 
base of the window) is also specified in quadwords. 
In practice, it looks like 1сс uses the base pointer 
to point to the parameter area, so db[0], db[1], 
db[2], etc. are parameter 0, parameter 1, parame- 
ter 2, etc. for functions that are about to be called. 

Since functions return values іп аго, this also 
means that db[0] holds the return value from the 
calling function's point of view. 


Calls and Branches in Elbrus 


Calls and branches are somewhat unique in Elbrus, 
they occur in two phases instead of in a single in- 
struction the way it works on most architectures. 
Elbrus uses the disp instruction to set up any kind 
of control transfer instruction. This sets the ctptr1 
register to the target address. The call instruc- 
tion executes the control transfer. This allows the 
pipeline to get a little bit of advance warning for 
the call, allowing it to set up state for the target 
function, which can be undone or ignored if the call 
doesn’t execute. The documentation refers to the 
ipd portion as specifying the “swap depth,” but it is 
unclear what this means. 


Return instructions happen similarly with a 
return instruction first that sets up the return and 
the ct instruction to execute control transfer. (This 
is also used for branches, as we'll discuss in the next 
section.) Notice that the function never seems to do 
anything with the return address. This is because 
Elbrus has a completely separate call chain stack, 
called the Procedure Chain Stack (PCS). Architec- 
turally this is referenced via the Procedure Chain 
Stack Pointer (PCSP) register. The PCSP is not 
accessible from user mode; rather, it is set up by 
the kernel similarly to how user stack memory gets 
set up on a per-process basis. 


The Procedure Chain Stack (PCS) 


The PCSP is pretty simple—it’s a 128-bit register 
with 64-bit “lo” and “hi” parts. The “lo” part con- 
tains the base address, and the 1” part contains an 
index to the current frame.?^ It is unclear at this 
point what the “rw” field is actually used for. 


(gdb) info registers рсвр 10 


pcsp_lo 0x1800c2e00002b000 1729596524238909440 
base 0xc2e00002b000 214267328638976 
rw 0х3 3 


(gdb) info registers pcsp_hi 


pesp_hi 0x200000000060 35184372088928 
ind 0x60 96 
size 0x2000 8192 


The Linux kernel source code shows the for- 
mat of the stack frames, in the form of the 
e2k_mem_crstack struct. Each frame is 32 bytes 
and consists of four saved 64-bit register values, the 
“lo” and "hi" parts for crO and cri, respectively. 
Again we are left without documentation on what 
exactly the сг0 and cr1 registers do, but they must 
be involved in control transfers. The Linux code 
shows that cro “hi” contains the return address, and 
cr1 contains a bunch of fields pertaining to the cur- 
rent procedure's register window. 

Here is the definition of e2k mem crstack 
(PCSP frame structure) in the E2K Linux kernel 
(arch/e2k/kernel/e2k syswork.c): 


typedef struct e2k mem crstack 1 


e2k crO lo.t  crO lo; //pf? 
e2k crO hi t crO hi; //return address 
e2k cri lo t cri 1o; //mess of fields - includes 
// interrupt enable flags 
e2k cri hi t cri hi; //more fields - includes register 


// window and stuff 
) e2k mem crs t; 


54Current frame address = base + index. 
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So the return %ctpr3 instruction is essentially 
saying “pop the current frame off the PCS into сг0 
and cri and stick the return address in ctpr3.” 


Branching 


A similar construction is used for basic branches. 
Rather than flags or conditions registers like in x86 
or ARM, VLIW processors often have a full set 
of condition registers called “predicate” registers. 
These allow the compiler to set up a sequence where 
multiple comparisons can happen in advance of a 
branch, and then a branch can be based on multi- 
ple predicates, or a sequence of branches can occur 
using the different predicate registers. 

Here’s a common design pattern seen in Elbrus 
branches. The following is essentially implementing 
if (condition) { function(); } in C. 


disp %ctpri, 0x10d48 
cmpedb,O %4г0, 0х0, %predO 
ct %ctpri ? ^predO 


First the disp instruction indicates to the 
pipeline the control transfer target, the function ad- 
dress. Then in the cmpedb instruction drO is com- 
pared to 0 and the result placed in %predO (true 
or false). Finally if %predO is true then the ct in- 
struction causes a control transfer, otherwise we fall 
through to the next instruction. 


Conclusion 


Elbrus processors are pretty capable and make de- 
cent Linux machines. While the Elbrus CPU may 
be under powered compared with similar Intel or 
ARM server processors, given the Russian geopolit- 
ical situation, these guys are going to stick around 
for a while. Elbrus’s VLIW architecture and register 
windowing will pose additional challenges for exploit 
writers. Fortunately, the Elbrus component instruc- 
tions are very RISC-like, despite the wide command 
format. 

In this article, we’ve explored the basics of the 
instruction set and the PCS using publicly available 
documentation. There’s a lot more to learn, how- 
We'll need some full documentation to start 
plumbing the depths of things like virtual memory, 
interrupt and exception handling, and the boot pro- 
cess. 


ever. 


22:11 Janus Polyglot 


Who left among you hold any faith in the empty 
promises of filetypes? Who is yet to accept to that 
beauty is in the eye of the parser? I hope that this 
gentle stroll through 512 harmless looking bytes will 
dispel any remaining myths that you, dear neigh- 
bours, continue to clutch to your collective chests. 

Regular readers of this fine journal may have 
seen my and @netspooky’s articles in PoC||GTFO 
21:09 and 21:10 respectively. Those were write- 
ups for the Binary Golf Grand Prix back in 2020 
where the challenge was to produce the smallest 
palindromic binary. 2021’s edition of this wonder- 
ful challenge pitted competitors against one another 
in a battle to produce the smallest polyglot bi- 
nary. There were two possible avenues of attack that 
were scored separately: you could either connive the 
smallest polyglot that was executable as a binary, or 
rack up points for every parser that successfully pro- 
cessed your entry. We decided to normalize scores 
by filesize for this second category, so that the em- 
phasis was still on as small a collection of bytes as 
possible. 

Feeling drawn towards this latter category, and 
seeing as the competition’s name begins with the 
word “binary,” I chose an x86 bootloader as my host 
binary. For those who haven’t delighted in the plea- 
sures of 16-bit real-mode assembly, a bootloader for 
x86 machines is a 512-byte blob that ends in Ox55aa. 
Execution begins at offset 0x0. That’s it. 


Ordinarily such a bootloader would be responsi- 
ble for loading some more bytes into memory from 
a hard drive and jumping to it, maybe setting up 
a stack or other registers along the way. The nice 
thing about choosing an essentially format-less for- 
mat is that I can shift around the code and data 
portions of the bootloader to make way for what is 
to come. I just have to fit in an appropriate jmp 
instruction to keep the execution flow flowing. 


So, what does this bootloader do? I thought it’d 
be fun to have a single string in the polyglot that 
I could either print for executables, or extract for 
archive formats. With this in mind, I reused some 
old 16-bit real mode assembly I wrote for printing 
strings to the screen. Printing a string to stdout in 
Linux is very straightforward thanks to the bless- 
ing of syscalls. (We do it later on with just a few 
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by Harvey Phillips 


lines.) However, 16-bit real mode affords us no such 
niceties. 

The very rough analogue of the syscall in 16-bit 
x86 is the interrupt. Your BIOS may be getting old 
now, but it still offers a wealth of prewritten rou- 
tines for you to use. Calling a routine via an inter- 
rupt is strikingly similar (for good reason!) to using 
a syscall: set a value corresponding to the routine 
you want in ah, arguments in b1, bh, etc, and throw 
the interrupt. As an example, let’s look at the very 
first of my routines that the bootloader portion will 
call into, clearScreen: 


pusha ; Save state 
mov ah, 0х6 ; "Scroll Up Window" routine 
xor al, al ; Number of lines to scroll 
; (0х0 is the whole screen) 
mov bh, 0x03 ; Colours: fg black, bg cyan 
Xor cx, cx ; (CH,CL) = coordinates of 
; upper left corner 
mov dx, 0x184f ; (DH,DL) = coordinates of 
; lower right corner 
int Ox10 ; Graphics Interrupt 
popa ; Restore state 
ret 


Pretty straightforward, right? Routine 0x6 from 
interrupt 0x10 is Scroll Up Window. We set some 
arguments in the other registers and then kick things 
off with int 0x10. The reason we need to scroll the 
screen at all is because it's usual for the BIOS to 
have left some text in the screen buffer as it loads, 
and we want to get rid of it. 

Once we've cleared the screen, we use another 
BIOS routine to set the cursor position, then we 
store the memory location of our string in the si reg- 
ister before calling our printString function. (Yes 
— the BIOS does not provide a routine for printing 
strings!) However, it's easy enough as we are pro- 
vided with a Display Character (TTY Output) rou- 
tine by the Graphics Interrupt 0x10. So, we simply 
loop over the bytes of our string, calling this BIOS 
routine each time until we hit a NULL byte. Just 
for added panache, I inserted a delay routine in be- 
tween printing each character. 

Running the  polyglot QEMU with 
qemu-system-x86_64 janus.com will spell out the 
string.?? 


in 


COM Shenanigans 


How different really is a bootloader to a COM file? 
A COM executable doesn’t need that pesky 0x55aa 
at the end, and there isn’t a hard byte count to deal 
with. However, if you take an x86 bootloader like I 
had started off with and run it in dosbox, you don’t 
see any output. No errors either, but what has hap- 
pened to my beautiful string that the BIOS prints? 
The answer lies in the console buffer. Despite its 
lowly appearance to today’s behemoths, DOS is in- 
deed an operating system (hence the letters O and 
S), and it does perform some slight attempts at 
memory management. The console buffer where we 
enter commands and see their output in DOS is not 
mapped to the same memory as the BIOS’s TTY 
output buffer. This means that our assembly is still 
writing our string, but to somewhere else in memory 
that doesn’t show on the screen! 

Fortunately, one of the many blessings of DOS 
is interrupt 0x21. One of the routines provided by 
this interrupt gives the ability to write a string to the 
DOS equivalent of stdout. The only thing we need to 
be aware of is that this routine expects such strings 
to be $-terminated. Yet more fortune is at our door 
upon discovering that interrupt 0x21 isn’t mapped 
to anything by the BIOS — int 0x21 doesn’t do 
anything if we aren’t in DOS! 

By modifying our printString routine in our 
source, we can first print the string in a DOS man- 
ner, and then in a BIOS manner. All we need to 
do is append a $ to our string (after the null-byte 
that the BIOS routine looks for so we don’t see it 
in either output), and remember that the offset to 
the string in memory is different in DOS than it is 
in BIOS. 

While I used nasm for the fine-grained byte con- 
trol it gave me, I opted to use its org directive to 
tell it to compute offsets relative to 0x7c00, the 
bootloader load address on x86 machines. This 
means that any other offsets to the string for non- 
bootloader sections of executable code would need 
to be calculated manually. For DOS, this is no has- 
sle as binaries are loaded at address 0x100, meaning 
I only have to add 0x100 to whatever the file offset 
of my string is. 
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So, the printString routine ends up like this: 


printString: 
; DOS Version 
push cs ; Set CS-DS 
pop ds 
mov dx, 0х211 ; Offset to string 
mov ah, Ox9 ; Ox9: Write to DOS stdout 
int 0х21 ; DOS Interrupt 
mov ax, 0х4с02 ; Ox4c: DOS Exit 
; 0x02: Return Value 
int 0x21 ; DOS Interrupt 
; BIOS Version 
pusha 
. loop: 
lodsb ; Load char from SI to AL 
test al, al ; Check for null-byte 
jz .end 


call printChar ; Print it. 

call delay ; Lazy animation effect 
jmp .loop 
‚ева: 

рора 


jmp waitForKeypress ; Will loop if held 


One final caveat for DOS fun, we need the file to 
have a .com file extension! Fortunately, none of the 
other parsers that janus supports had a file exten- 
sion as a hard requirement. 


We can test the COM portions work as expected 
by running janus.com under DOSBox. 


ELF Shenanigans 


From chaos, evolved structure — and so we shall 
also find that from the disarray of real-mode, we are 
able to find order in the ELF specification. 


An ELF file contains a great deal of structure, 
but many fields in its various headers are ignored by 
the Linux kernel’s ELF loader. We can play this to 
our advantage by populating those fields with con- 
tent for other parsers! For a recent overview, I sug- 
gest that readers take a look at tmp.0ut’s Issue 1:1 
where I peruse the various fields in the ELF and 
program headers, foraging for those that we are free 
to do with as we like — without affecting execution. 
For a more fiendish appraisal of these fields, I highly 
recommend Gnetspooky's series of blogs chronicling 
his journey to produce an 85-byte ELF.96 


[COM/MBR] Start 


vox 28) 46 46 OA 00 9E 20 28 32 34 39 31 29 00 00 9+2 9 MEE ODE 
[MBR] PRINTCHAR 
-»2842 mov ah, ФХЕ DiseLav CHaracter (TTY Oureur) 
: 4A 00 00 00 00 во во 00 did el revere мин 
x8 BAXOE!B7200] 83700 CD 10 2C+2 mov bl, @ Forecrounp Cotour 
sx 9999 90 Gy 40 00 38 00 01 00 us = Ox10 GRAPHICS INTERRUPT 
[COM/MBR] seTCursor 
ха 60 [8402] ВуДО0) B6 ->3A+1 pusha 
ъх 071874001 CD 10/61/03 3B+2 | mov ah, 2 SET cursor POSITION 
3D+2 | mov bh, @ Pace NUMBER 
3F+2 | mov dh, 2 Row NUMBER 
41+2 | mov dll, 9 COLUMN NUMBER 
43+2 |int @х10 GRAPHICS INTERRUPT 
45+1 popa 
46+1 ret 
[COM/MBR] next 
x PEAY 01 00 00000500 27.3 cait bootloader -> 0x180 — 
[COM/MBR] cLEARSCREEN um 
зох 00 00 2B 00 00 00 00 00 00 00 eo т "n 4% : 
CROLL UP WINDOW 
ха 60 В4705/50Д00/Б/, 7D+2 | xor all, al Мимвек or Lines то свои (0x00 = Fuu) 
03) 318 C9) ВИКИ) С) 10 61/05 7F+2 | mov bh, 3 COLOUR ATTRIBUTE 
8142 | жоғ an, ах (CH,CL) = coorps or Upper LEFT Corner 
83+3 | mov d, ABAF (DHL) = LOWER RIGHT CORNER 
86+2 | int Өх10 GRAPHICS INTERRUPT 
88+1 popa 
8941 ret 
[COH] PRINTSTRING 
ser CS-DS 
xA OE) 1F ->8А+1 | push cs 
8B+1 | pop ds 
WRITE STRING 
xc 881407) ВИ} 8C+3 | mov cbs, 09211, 
32572809) CD 21 8F+2 | mov ah, 9 WRITE STRING TO STDOUT 
9142 | int 0x21 DOS INTERRUPT 
Exit(2) 
хз B8 02 40/0) 21 93-3 | томах, ФхАсӨ2 4C=Exit 02- Бет VaL 
96+2 | int @х21 DOS INTERRUPT 
[MBR] PRINTSTRING 
x8 a 98+1 pusha 
LOOP 
x9 00/84 CO 74:08) E087. ->99+1 | lodsb Loan cHar In (SI) то AL 
sex Tag DS 08 8) F3 ЭА+2 | test al al CHECK For NUL-BYTE 
9C+2 |: end -»0xA6 J 
9E+3 | call printChar Print THE CHAR-» 0x28 
A1+3 | call delay CHEAP ANIMATION Еғест-> 0114 
A4+2 | jmp eloop -»0x9i 
END 
x6 (2422 Д0 (BO 01 66 89 C7 ВЕ ->A6+1 | popa 
A743 | jmp waitForKeypness -> 0х1Ғ5 
их 00 STRING 
хі B GGP 1004 GOT ШО ->111+33 Swing BEPP... \n\r\es 
ізу gi ТЕ НЕТ НИКЕ ТЕМ 5 TRAN 
ізу G E ә хеш eh in a toon 
чх АКТИ си 3D 78 00 40 07 00 50 4B 01 02 00 
cox 58 58 58 58 58 F3 00 00 00 [MBR] veray 
х9 60 ІВ 86) BO 00 |69201 17А+2 | mov ай, @x8S BIOS Wair 
зах DES 9 О 50 4B 05 06 00 00 58 58 17C+2 | mov al, Ө Unused 
17E+3 | mov cx, 1 5есомо5 
18143 | mov 4% @ Мшѕесомоѕ 
18442 |int Өх15 MEMORY INTERRUPT 
186+1 popa 
187+1 ret 
[COM/MBR] Воотіолрев 
w fau Xa ЯП DES CE А9 OE 20 02 ->1B0+3 call clearScreen -»0x1A 
1B3+3 call setCursor ->0x3A 
1B6+3 mov sì, msg -> 0x111 
1B9+3 call printString -»0x8A 
[COM/MBR] млітҒовКеүрвЕ55 
->1F5+1 nop PADDING 
1F6+1 pusha 
1F7+2 5” @ХӘ Сет KEYSTROKE 
1F9+2 | int Өх16 KEYBOARD INTERRUPT 
1FB+1 popa 
ү а Е о 
= 90 60 (84700) CD. 16 61 (ЕВИ FELD ай @xSSAA 


Figure 10: COM and MBR’s side of Janus 
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For the attendees in the back who are not fully 
acquainted with the internals of ELF, here is very 
brief overview of the parts relevant to us: 


e The ELF header (\x7fELF) must begin at off- 
set 0x0 


e The e phoff field of the ELF header is a file 
offset to first program header 


e The first (and in our case, only) program 
header will detail where our x64 Linux assem- 
bly can be found, and where it is to be loaded 
in memory 


The important takeaway here is that, although 
our ELF header has to begin at offset 0x0, the pro- 
gram header can appear much later because we pro- 
vide the ELF parser with an offset to it. However, 
we do have a potential issue: we already have some- 
thing at offset 0x0, the entry point to the BIOS and 
COM assembly! 

The first few bytes of the ELF header (and there- 
fore any valid ELF file) are \x7fELF, which disas- 
semble as 16-bit real-mode instructions to: 


jg 0x47 
dec sp 
inc si 


So, upon our dutiful BIOS loading this particu- 
lar collection of bytes into memory and jumping to 
offset 0x0, it will immediately jump to offset 0x47, 
thanks to how the EFLAGS register is initialized 
at boot. (At least in SeaBIOS that QEMU uses — 
Га be very interested if any neighbours know of any 
variance in this observation!) Therefore, all we are 
required to do in order to overcome this calamity is 
move our real-mode assembly elsewhere, and place 
yet another jmp to it at offset 0x47. This way, after 
bouncing around a few times, our BIOS and DOS 
functionality is preserved. 

Populating the beginning of our file with an ELF 
header, and armed with a list of fields that we know 
are ignored by the Linux loader, we can fill in sev- 
eral gaps with more interesting things. At this stage 
of my design, I simply left these fields with X’s so 
that I could come back later and put something fun 
in its place. Several of the real-mode routines are 
small enough that they fit in overlooked uint64 t 
fields. Can you spot them all? 

Lastly, an ELF that presents itself as executable 
in its header requires something to execute! Run- 
ning with the same theme of printing the string al- 
ready present in the file, I used: 
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mov al, Oxi 


mov di, ax 


SYS WRITE 
Write to stdout 


; (file descriptor 1) 

mov esi, 0х400111 ; Virtual memory address of the 
; string: 0х400000 + file offset 

mov dl, 0x32 ; String length 

syscall 

mov al, Ox3c ; SYS_EXIT 

inc di ; Return value 0x2 

syscall 


Notice that we have to calculate the virtual ad- 
dress of the string manually again! The string ap- 
pears at file offset 0x111, and our ELF is loaded to 
address 0x400000. Adding the two gives us the right 
address. 

As a final touch, we can now set the size of our 
file to be loaded in the p. filesz and p. memsz fields 
of the program header, set p. offset to 0x0 so we 
load the entire 512 bytes, and at long last we can set 
e entry so that the Linux loader knows what vir- 
tual memory address to jump to after loading our 
ELF into memory. 

'To test things are as they should be, we can run 
the binary in any x64 Linux distro. 


RAR Shenanigans 


Long time neighbours will no doubt have seen sev- 
eral polyglots over the years incorporating the RAR 
file format. It was my intention all along for each 
of the incorporated file formats to make use the 
same string over and over again, either printing it 
or decompressing to it. Fortunately, RAR (and as 
we'll see later, ZIP) supports containing files with- 
out compression, meaning we can just dress up our 
string with the appropriate structures and unrar 
should play fair! 

For anyone looking to get a decent handle on the 
RAR format, Ange Albertini's poster on page 57 is 
an invaluable first step. Looking at this, we see a 
reasonably straightforward structure to the file. One 
of the several fun things about the RAR format is 
that the Rar! magic can appear at any offset in the 
file, which means we aren't bound to place the RAR 
part of the file at any particular location. 

However, unlike in the executable portions of 
janus, we can't point the unrar parser to any lo- 
cation we like for our (un)compressed data. Indeed, 
the RAR File Header must immediately prepend the 
data, and the Archive End structure immediately 
follows it. This is one of the first hard restrictions 
on our binary. We have a whole 0x3d bytes before 
our string, and another Ox7 bytes after it. If we 


sax 


IJK 


SX 


OSX 


ха 


ЭХ 


Jox 


WAX 


ХӘ XL x2 ХЗ х4 x6 x6 X? x8 ХЭ XA x8 ХС xD ME XF ELF HEADER 


E. IDENT 
LENDER 05 (292 29 28 32 34 99 21 29 00 00 0+4 eye \X7F ELF 
541 LELDATA NONE 
ELF6u Енре 
02 00 3E 00 00:00:00:00)[A 0300040 00000040000 10-2 |etype 2 ET.EXEC 
12-2 | e machine Ox3E EM X86 64 
1444 | e version IGNORED 
3 1848 | e sni 0::4999AA -> 0xAA 
да 00 00 00 00 00 00 00 р 05 87 99 B9 09 6D 40 2048 | e.phoff ФХАА -> 0x5 


90 9) 99 &3 CO Шуу ЛО 060 B4 02 87 09 BS 34-2 | eehsize ахад 
36-2 | e.phentsize 9x38 
1 


38-2 | а оил 

XO XL x2 ХЗ x4 xo x6 X? x8 ХЭ ХА ХВ ХС xD XE XF ELF6U PupR (PRoGRAM HEADER) 
01 00`00 0005/00 А+ ptype 1 LOAD 

0000000000 0000000000)00 00 40 00 00 00 ae p.ftags i XWR 

5248 ©9035 
0000 RRRRBRR BEBO WOO oD Oy 5A+8 pwaddr ex400000 
00800)28700700700700%00700700 6A+8 pfilasz QAB ЗнешсореЕ + STRLEN 

72-8 pməmsz OX2B 5нвсовЕ + STRLEN 
XO х1 x2 ХЗ х4 x6 x6 X? x8 XƏ XA x8 XC xD ME ХР x64 CODE 
FF E8 05 00 EB F3 61 E9 4B 01 ->AA+2 mov al, WRITE 


8001166897) BE AC+3 mov dil, ах STDOUT 
1 01 40 00|B2 3210F 05 BOSC 66 FF C7 Е 05 52 AF+5 mov esi 0x400111 вуғғк-> 0x111 
B4+2 mov dl, @x32 sTRLEN 


B6+2 syscall 

B8+2 moval, @x3C EXIT 
BA+3 ine di RET 2 
BD+2 syscall 


BSPP... \n\r\es 


@ x41 х УЗ ха УБ x8 x? Ұй УЭ ха WB xe ұз VE И 
RO ML XZ XS Х& XƏ X6 Xé X8 ХЭ ХА MB ХО X0 XE Xi 


Figure 11: ELF’s side of Janus 


Name: rar4.txt 


туре File 0+7 Magic Rar! EOF BEL NUL Signature 
12е: 
Packed size: 4 ccu acd d Mi ELM E VENIM 
Ratio: 100% 7+2 €RC16 Ox99cF i 
Modified: 2020-01-18 19:08:40,000000000 "be BlockType @x73 Main header Main header 
Attributes: ..A.... / А+2 Е 99999 
CRC32: 982134A1 / ag: у 
Host OS: Windows C+2 BloekSiza as 
Compression: RAR 1.5(v29) -m0 -md-128K | Е+6 PosAV ® NoAV signature 
XO XL x2 XS x4 XS x6 X7 x8 XƏ 8 D XE uF 3 / 14+2 CRC16 @x7315 
КИК о TE сз 
land + In ong oci 
47 а 1 СЕ? 90) 73 |00200 00100 "d 19:2 Bloeksizs кеге 
ax (00700100100) EDO E 18+4  CompSiza 4 . 
ч [1573] 74 2080/5) бИ 100] 00700] 04 2 4 File header 
^ * Host in 
D Ш 2 a 11 34 21 ш 14 = 32 x 1D 36 0800 2444 САСЗ2 0x982134A1 
г аг: o х 3 2844 Timestamp 2020-01-18 19:08:40 
+e UNA *\ 2C+1 Version @x1D v2.0 
2D+1  CompMethod @x3@ Uncompressed 
уа ame TT \ 2E+2 Filenmelen 8 
|! 3D) 7B (00430) OA0O) or NON, Зона Attributes | 6x20 Archive 
xi X2 XS x4 XS x6 X? x8 x9 xà XB Е NOM 3448 Filename rar4.txt 
ее с 
“| 304 File data ААА 
40+2 САС16 @x3DC4 
A [2 AR v4 FIL E 42+1 BlockType @x7b End of archive E nd block 
4342 F. 94999 
45+2 BllockSiza 7 


ROSHAL ARCHIVE 


Figure 12: Ange Albertini's Poster on RAR Format 


want to relocate our string later, we have to move 
all these surrounding bytes with it. 


There is one slight loophole here that we will cer- 
tainly play to our advantage when it comes to ZIP 
shenanigans: the field at the end of the File Header, 
just before our string begins, is the filename. Ordi- 
narily, the filename would be just that, the filename. 
'There is even a separate field for the filename length, 
so we don't have to null-terminate it or anything like 
that. It turns out that the filename can actually be 
anything we want—including non-printable charac- 
ters! 


'There is a pretty big downside to all of this RAR 
business. Although we control the size of the data 
in the File Header, the unrar parser will not tol- 
erate any junk between the end of the compressed 
data and the start of the Archive End. Therefore, 
extracting our string with unrar will include the 
\n\r\0$ bytes in its output. I thought about possi- 
ble ways around this due to its esthetically displeas- 
ing nature, but it seems to be a necessary evil. 


There were two major stumbling blocks I found 
along the way. The first was the CRC. In the for- 
mat specification, it occupies the top two bytes in 
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each of the Main Header, File Header and Archive 
End structures. Leaving these bytes as NULLs made 
unrar complain about a CRC error, so I was reason- 
ably confident that the rest of the bytes were cor- 
rect. I had seen in various sources that the CRC 
was а CRCI16, but after trying several times with 
different regions of bytes, and different polynomials, 
I couldn't find anything that worked. 

Eventually I resorted to RIFM'ing and I 
dragged up the UnRAR sourcecode. This is found 
in rawread.cpp. 


// RAR 1.5 block CRC. 
uint RawRead::GetCRC15(bool ProcessedOnly) { 
if (DataSize <=2) 
return 0; 
uint HeaderCRC=CRC32 (Oxffffffff ,&Data [2], 
(Processed0nly ? ReadPos:DataSize) -2); 
return ^HeaderCRC & Oxffff; 


F 


After smacking my head against the desk a few 
times, I tried computing the CRC32 of the Main 
Header, and chopped off the top two bytes to ob- 
tain Ox90cf—precisely the CRC of the Main Header 
from the reference I used. A truncated CRC32 is 
most certainly not the same as a CRC16! Had I 


begun by looking at the unrar sourcecode instead 
of trying to brute force various CRC16 polynomi- 
als to find a match where there was none, I would 
have saved myself several evenings. Fortunately, the 
python zlib library offers a crc32() function which 
precisely computes the CRC we need: 


>>> header = bytes.fromhex( 
^7300000d00000000000000? ) 
>>> hex( zlib.crc32(header) & Oxffff ) 


*0x90cf' 


The second confusing feature of the RAR format 
was the datetime format in the timestamp field of 
the File Header. Eventually, I found it documented 
in one of the Kaitai Struct examples.?" It’s just а 
bitfield, common in DOS-land. Both the date and 
time occupy a uint16 each. 


year = ((date & 0b1111111000000000) >> 9) + 1980 
month = (date & 0b0000000111100000) >> 5 
day = (date & 0b0000000000011111) 
hour = (time & 0b1111100000000000) >> 11 
minute = (time & 0b0000011111100000) >> 5 
second = (time & 0b0000000000011111) * 2 
To be confident things are working properly, 


unrar p janus.com happily produces our string, 
with the unfortunate extra $ on the end. 


ZIP Shenanigans 


If you are not yet acquainted with the details of the 
PKZIP format, and felt that incorporating a RAR 
into our polyglot was intricate, I have bad news for 
you. But the PKZIP format actually lends itself 
very nicely to polyglots! The thing that makes it 
unique (at least in my experience) is that a proper 
PKZIP parser, will process a file backward. Typi- 
cally, we think of parsers are looking for some magic 
value which indicates the start of the data it should 
parse. PKZIP flips everything on its head and in- 
stead looks for the End of Central Directory signa- 
ture, which comes at the end of the file. 

In this End of Central Directory, there is a 
file offset and size of the Central Directory. The 
Central Directory holds all the information about 
our (un)compressed files contained within, includ- 
ing their filenames, and CRCs. (This time around, 
it’s just a CRC32.) Also included in this directory 
are offsets to our data, which is always prepended 
by a Local File Header. 
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rar.ksy, near line 151. 


Let’s take a moment to ponder this last point. 
Our data (the string we keep re-using) must be 
prepended by the PKZIP Local File Header. But 
we've already added our RAR shenanigans which 
also required our data being prepended by some- 
thing. (In the case, it was the similarly named File 
Header.) How can we reconcile these two facts? The 
trick lies in something I hinted at earlier! The final 
field of the RAR File Header, which comes immedi- 
ately prior to the start of our string, is the filename 
of the to-be-extracted file. Seeing as we aren’t too 
fussed by actually extracting this string to a file with 
unrar, we can simply use this filename field to store 
the PKZIP Local File Header! The downside is that 
we'll end up with a nasty filename in our directory 
if we run unrar with the x switch. (Try unrar p 
janus.com instead.) This seems like a small price 
to pay in order for RAR and PKZIP to peacefully 
coexist! 

As other devotees of weird machines will no 
doubt be familiar, when a trick like smuggling bi- 
nary data in filenames works with one format, we 
are led to ask whether it will work elsewhere? If 
the RAR specification outlines no consequences for 
unpleasantness in a filename, does the PKZIP spec- 
ification also afford us this luxury? It does! 

In contrast to the RAR format, the filename in 
PKZIP lies in the Central Directory rather than the 
Local File Header. This means that the filename 
according to PKZIP actually occurs later in the file, 
whereas RAR believes the filename lies just before 
the data begins. This trick wasn't actually needed 
based on the file formats that I selected for inclu- 
sion in my polyglot, but it may well be useful to you 
in future endeavours. In my case, I opted to place 
one of the 16-bit real mode routines into the PKZIP 
filename, namely the delay routine. When was the 
last time one of your binaries executed a filename as 
machine code? 


GNU Multiboot2 Shenanigans 


At what point do we call something a file format? 
How much format does there have to be to a file? I 
ask because I have trouble identifying this next in- 
clusion with an actual file format. Indeed, the GNU 
Multiboot2 format has a specification and a parser 
(grub-file from the grub2 project). But... well, 
read on and see for yourself if you agree with my 


58https: //www.gnu. org/software/grub/manual/multiboot2/multiboot .html 
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[GNU] Иилвоот 2.02 


1А0-4 Magic 0xE85250D6 
1A4+4 Architecture Ө 1386 
1A8+4 Header length 6199 
1AC+4 Cheeksum Ox. 7ADAE2A 


Figure 13: Multiboot’s side of Janus 


feeling of cheekiness in including it in my polyglot. 

The GNU Multiboot2 is a pretty straightforward 
specification that allows a bootloader like GRUB to 
boot a file without having to go via the BIOS. GRUB 
will parse a file top-to-bottom looking for the magic 
(0xE85250D6), so we can have anything we like both 
before and after the relevant bytes. In total, we re- 
quire four uint32’s worth of bytes, but we have to 
be 64-bit aligned, so I ended up with an additional 
four bytes of padding to round off the PKZIP End 
of Central Directory. 

The format is as follows: Magic, Architecture, 
Header Length, Checksum. That’s it. I already 
mentioned that the magic is OxE85250D6. The ar- 
chitecture value corresponding to 1386 is simply 0x0 
and the header length is self-explanatory. The only 
thing worth commenting on here is the checksum. 
It's possibly the simplest checksum I've ever encoun- 
tered: the unsigned 32-bit sum of the magic, archi- 
tecture, header length and checksum is 0x0. Simple! 

So, all that was required to be able to claim an- 
other file format in my polyglot was to find room for 
20 bytes, including four bytes of padding! Cheeky? 
Absolutely. Technically correct? Absolutely. 

If you have GRUB installed on your ma- 
chine, you can test the validity of the poly- 
glot as a GNU Multiboot2 image with grub-file 
-is-x86-multiboot2 janus.com. There should be 
no output, but echo $? will inform you that 
grub-file returned 0. 


Commodore 64 Shenanigans 


Up until this point, we’ve been playing around with 
well trodden parsers and specifications. It was cer- 
tainly a lot of fun getting to this point, but when 
I looked back at my in-progress polyglot in a hex 
editor, I saw lots of empty space. This displeased 
me. A certain idea had been bugging me for a 
while as I was working on this project: could I in- 
corporate support for an 8-bit computer? Back in 
the 80s, when 8-bit machines reigned supreme, hard 
drives were prohibitively expensive for most people, 
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so programs were typically stored on floppies and 
cassettes. My initial approach was to explore the 
tape format of the ZX Spectrum—falsely expecting 
it to be reasonably malleable to the kinds of distor- 
tions that are suitable for polyglotting. A week goes 
by and I realised that it wasn’t going to work. (For 
those interested: Kaitai Struct already has excellent 
support for this format.) 

The next thing to try on my list was the Com- 
modore 64 PRG format, which turned out to only 
just be possible! As you’ll see further down, we end 
up having part of our ELF header form lines of BA- 
SIC, and we make use of 75% of a uint32. This was 
my first time playing with machines and architec- 
tures from this era, and it was a lot of fun! 

(Note to the reader: in keeping with 8-bit 
tradition, hexadecimal values in this section are 
prepended by ‘$’.) 

For any neighbour unacquainted with the won- 
ders of the Commodore 64, it is an 8-bit computer 
first released in 1982. It’s powered by an 8-bit 6502 
CPU and sports 64k of RAM. All pointers are two 
bytes long. The primary way to interface with the 
machine is the BASIC interpreter, which it boots 
to. There are several different file formats that can 
be loaded into memory from either floppy, cassette 
or even cartridge. (The cartridge was a distinctly 
North American luxury that my European ances- 
tors were seemingly deprived of.) In my case, I went 
for the most common file format: PRG, short for 
“program.” 

Before we even begin looking at the structure of 
these files, we need to know something about how 
they are loaded into memory. Indeed, confusingly 
enough there are two different ways: absolute and 
non-absolute. The difference is whether the Com- 
modore 64 will load the PRG file where it wants to 
be loaded, or just ignore it and load it to the start of 
BASIC RAM at $0800. This was important because 
of the lack of dynamic linking at the time; many 
programs had hard-coded offsets that required be- 
ing loaded to a particular address in order to make 
any sense. 
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[Rar] Масс 
BF+7 Signature Rarl4Z4G\@ EOF BELL 
[Rar] Main HEADER 
C6+2 CRC32 Ox90CF CRC32(HEADER) & OxFFFF 
C8+1 Blockiype Qx73 HEAD HAIN 
CB+4 BlockSize Фу) 
[Rar] Fite HEADER 
D3+2 CRC 0x924A CRC32(HEADER) & OxFFFF 
05+1 BlockType @x74 HEAD. FILE 
D6+2 Flags €x9929 LHD WINDOW128 LONG BLOCK 
08+2 BlockSize Ox3E 
DA+4 CompSize 6x33 
DE+4 UncompSize 0x51 
E2+1 HostOS 2 HOST. WIN32 
E344 CRC32 @x8AC49CB1 (contents) 
E7+4 Timestamp ФхС286САФ 1/3/1986 13:31 
EB+1 Version @x14 VERSION_2_0 
EC+1 Method 6x39 UNCOMPRESSED 
ED+2 FilsNameLen бар 
EF+4 Attributes @x20 ARCHIVE 
[Zip] LocatFice HEADER / Rar: FILENAME 
->F3+4 Signature 
F7+2 VersionNeeded ӨХА 
101+4 CRC32 @x2CD1A660 (contents) 
105+4 CompSize 6x35 
109+4 UncompSize 0x30 
STRING 
111+33 String ВӨРР... \n\r\es 


[Rar] Archive END 


14442 CRC Ox3DCA СРСЗ32(нклоев) & OxFFFF 
14641 BlockTyps Gx7B HEAD ENDARC 
14742 Flags 94999 
149+2 BlockSize 7 
[Zip] CENTRAL DIRECTORY 
-»14B«4 Signature РК\1\2 
151+2 VersionNeeded | O0xA 
15B+4 Cre82 9х20014559 
15Е+4 CompSize 6x39 
163-4 UncompSize ex3e 
167+2 PisNamsLsn OxF 
175+4 LFHOfiset OxF3 -> 0xF3 
[Zip] Емо or CENTRAL Dir 
18844 Signature РК\5\6 
192+2 En&yCount 1 
194«4 Size @x3D 
198+4 OfisetCD DAAB -> 0x14B 


Figure 14: Rar and Zip’s sides of Janus 
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We are very lucky that this is the case! The 
first two bytes of a PRG file are a pointer to where 
in memory the PRG is supposed to be loaded. In 
our case, this is $7£45 (the start of the ELF magic), 
which is not a valid location for a BASIC program to 
be loaded to. However, by loading our PRG in non- 
absolute mode, these bytes are ignored, although 
they must still be present. 


'The next two bytes are supposed to be a pointer 
to the first line of BASIC. We are stuck with this 
being $4c46. (This is the ‘LF’ of the ELF magic.) 
Non-absolute mode to the rescue! Our file is go- 
ing to just be parsed sequentially instead of hopping 
around for lines of BASIC to interpret. 

What comes next is a line of BASIC. I’m sure 
many readers will have written some BASIC before, 
even those like myself who are too young to have 
lived through BASIC’s heyday. But what does a 
line of BASIC look like on disk? Disk space was a 
premium back in the 80s and it didn't make sense to 
store entire words like PRINT, PEEK and POKE when 
a single byte could accomplish the same job. Fortu- 
nately for the programmers, commands like LIST au- 
tomatically converted the tokenized BASIC on disk 
and in memory to the much more familiar and ver- 
bose form that we all know. 

So, according to a PRG file, a line of BASIC is 
composed of: a two-byte little-endian line number, 
a single byte BASIC token, arguments in PETSCII 
(kinda like ASCII, as we'll see in a bit), and à NULL 
terminator. Here we are at offset +0x4 into our ELF 
header, writing BASIC! Out of respect and defer- 
ence to the old ways, our first line number is going 
to be 10, but what are we going to actually do? 

As we don't have a whole lot of room to do much 
of anything in before the ELF header starts getting 
picky with us, we have to move our execution some- 
where else as soon as possible. The easiest thing 
to do is to make our BASIC program simply jump 
to some 6502 machine code with the SYS instruc- 
tion and then terminate. That sounds easy enough, 
apart from having to write 6502 assembly. Let's fo- 
cus on cramming our minimal BASIC program into 
what little space we have first, then we can figure 
out where to pass execution to later. 

On page 62, we have the first 24 bytes of 
janus.com, with both the ELF and Commodore 64 
interpretations of each byte. Let's take it from the 
top: 

As already mentioned, the first $7£45 pointer 
would be the load address of the PRG if we loaded 
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in absolute mode, so these bytes are ignored, as are 
the next two bytes $4c46, which completes the ELF 
magic. 

Now comes $0200, or “10”, which is our first BA- 
SIC line number. The ELF parser believes this to be 
EI_CLASS and EI_DATA. Next up we have $9e which 
is the BASIC token for the SYS instruction, which 
will jump to executing 6502 instructions at the dec- 
imal address we provide it. ELF parsers believe this 
byte to be EI_VERSION. Asking readelf, we are in- 
formed that the version is 158, or 0x9e in hex. So 
far so good! 

Next up is the argument to the SYS instruction: 
“(2491)”. The actual number is variable, and for 
a long time I left this as 1234 until I knew exactly 
where in memory my 6502 instructions would be. 
These bytes occupy the region that the ELF spec 
identifies as EI_PAD. (The elf man page is a terrific 
quick reference for all these structs. In this case 
we're looking at E1£64_Ehdr.) 

Assuming our 6502 instructions do what we want 
and culminate with a rts instruction, we will end 
up back in BASIC and we should be good? But no, 
our BASIC program will continue running, and we 
need to gracefully finish it. Unfortunately, the next 
few bytes form the e. type and е machine fields of 
the ELF header, which we cannot mess around with. 
Any deviation from their current state will result in 
the ELF not running under Linux. 

So, what does the Commodore 64 think these 
bytes mean if we just leave them alone? First, no- 
tice that we're actually off-by-one between the ELF 
and Commodore 64 interpretations now: the final 
byte of EI. PAD is 0x00, but forms part of the $0002 
pointer to the next line of BASIC. Similarly, the 
0x02 byte is the start of the 0x0200 e type field of 
the ELF header! 

We have $0002 as a pointer to a line of BA- 
SIC, but that gets ignored unless we're in absolute 
mode (we aren't). The bytes that follow, $003e, is 
the BASIC line number, in little-endian! 0x3e00 is 
15,872 in decimal, and indeed, if we run LIST on the 
Commodore 64 after loading this PRG, we see: 


10 SYS (2491) 
15872 


So, in other words, the second byte of e. type 
and first byte of e, machine are interpreted as a BA- 
SIC line number! Pretty cool! To finish up our BA- 
SIC program, we have an instant null byte which 
ends line 15872 of BASIC, which is also the second 


EI MAGIC 

EI CLASS 

EI DATA 

| EI VERSION 

| EI OSABI 
| EI PAD 


| e_version 


e 
| e_machine 
| 
| 


ТЕ 45 4C 46 OA 00 9E 20 28 32 34 39 31 29 00 00 02 00 ЗЕ OO 00 OO 00 00 


| | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | " (2491)" 
| SYS Token 

| Line Number (10) 


Pointer to first line (ignored) 
oad address (ignored) 


| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
L 


| 
| 
| 
| 
| 
Р 


| 1 1 

| | | Empty line, FIN 

| | Pointer to next line(ignored) 
| Empty line 

Line Number (15872) 


ointer to next line (ignored) 


Figure 15: First 24 Bytes of Polyglot, labeled for ELF and Commodore 64 parsing 


byte of e_machine (0x3e00). Finally, we have a null 
pointer to the next line of BASIC, followed by an- 
other null. This indicates the end of the BASIC pro- 
gram. But here we have the fun observation I men- 
tioned earlier: our BASIC program ends 3 bytes into 
the e_version uint32. The last byte only matters 
to the ELF parser! And with just one byte to spare! 
The next field in the ELF header is e_entry, which 
is the entrypoint of the program and something we 
definitely can’t screw with. Phew! 


Okay great, we can just about smuggle a very 
small BASIC program into the ELF header which 
will jump to any location we specify and start ex- 
ecuting 6502 instructions. To top it off, as long as 
we finish our machine code with an rts instruction, 
our program will gracefully terminate and return to 
the BASIC prompt! So, what are we going to do? 
Print our string of course! 

If, like me, you have never written or even looked 
at 6502 assembly before, you'll see that it’s not too 
hard, but does have a couple of quirks. For one, the 
6502 is an 8-bit CPU, but the Commodore 64 has 
64k of memory. This means that every pointer takes 
two store operations to store in memory: one for the 
low byte, and another for the high byte. 

Secondly, the first page (256 bytes) of memory is 
called the zero page. (The first byte of any pointer 
indicates which page it’s in.) The zero page is spe- 
cial because we can index structures within it with 
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just a single byte, i.e. we only need one register 
to be an index into our string if we store it in the 
zero-page. 


TAPE LISTING PRINTERS 


Price is all we have to offer! Our tape listing 
printers are slow. They accept and print up to a 
nine character data line, and it takes almost 

a whole second. We don’t know how long they will 
continue to do this, because we've only been 
selling them for six years and nobody tells us when 
they stop. This is why we have to sell the darn 
thing to OEM’s ata starting price of $275. 

If you just want the insides you can buy it for 
less. For a Bingo card circle you can have a 

data and price sheet. Please forgive the cheap 

ad, the mark-up on the printer isn’t sufficient to 
pay for a fancy campaign. It's just the plain, 
unadulterated truth. Send for facts today—Bingo— 
letter or phone. You'll receive a happy surprise, 
maybe, we hope. 


Model 5-110-9 Printer 
Rack Mount Option 


CONNECTICUT 
TECHNICAL 


corporation 


3000 MAIN STREET 
HARTFORD, CONNECTICUT 06120 


Despite ASCII being nearly twenty years old 
when the C64 was first released, it instead uses 
PETSCIL, which supports two slightly different lay- 
outs. At boot, it has the first character set loaded 
with only has capital letters. Our string has lower- 
case letters too, but if we try printing it now, we'll 
see it all caps. We can load the alternative charac- 
ter set (which does include lowercase) by “printing” 
the byte OxOe. We do this using the C64 CHROUT 
routine which lives at $ffd2 in the Commodore's 
KERNAL ROM. АП we have to do is put 0х0е in 
the A register and jump to the right address ($#ға2): 


lda #0x0e 
ізі $ffd2 


Next we have to store a pointer to our string in 
the zero-page. I chose $0020 for this, so well be 
storing bytes at $0020 and $0021. Instead of work- 
ing out manually where my string would be, I just 
loaded the binary in the VICE emulator and used 
the built-in monitor (debugger to you and me), to 
see where it ended up. It turns out the string lives 
at $0910. (BASIC RAM starts at $0800, so this 
feels about right.) Storing the pointer simply looks 
like: 


lda #0x09 ; Load Ox09 in A 
sta $21 ; Store byte in A in address $0021 
lda #0x10 ; Load Ox10 in A 
sta $20 ; Store byte in A in address $0020 


A little unusual to modern eyes, but still pretty 
straightforward. Lastly, we just need to write some 
logic to loop over our string, checking for a null-byte 
terminator, and then return control to the BASIC 
interpreter with rts. 

'There are two final quirks to consider. First, the 
Commodore 64 has a 40-character wide display, but 
my string is longer than that. I opted to include 
a manual line break after 33 characters have been 
printed just so things wrap in a nice way. Similarly, 
I also print another line break when we're done so 
that the BASIC prompt appears neatly on the next 
line. 

The other quirk deals with PETSCII again. The 
string in memory is ASCII because that's what every 
other format that uses it expects. Is converting from 
ASCII to PETSCII going to be a royal pain? As for- 
tune would have it, in this second PETSCII charac- 
ter set, the byte representations of the alphanumeric 
characters differ only in the sixth most significant 
bit! The alphanumeric characters begin at 0x40 on- 
wards, so we only need to make the conversion for 
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bytes larger than that. Therefore in our character 
printing routine that the string printing routine calls 
each loop, we can simply do the following (the ASCII 
byte to print is in the A register): 


cmp #0x40 ; Compare byte in A to 0x40 

bcc +$2 ; Branch if Carry Clear to the jmp 
; instruction (i.e. if A « Ox40) 

eor #0x20 ; Toggle 6th bit.. 

jmp $ffd2 ; Jump to CHROUT in KERNAL ROM 


We check to see if the byte is greater than Ox40 
(‘a’ in PETSCII character set 2), if it is, we bitwise- 
or it with 0100100000 to flip the 6th bit, and then 
jump to the CHROUT routine in ROM. 

Putting everything together, our 6502 assembly 
looks like this: 


lda #0x0e ; Full Character Set 
jsr $ffd2 ; CHROUT 
lda #0x09 
sta $21 ; High Byte of String 
lda #0x10 
sta $20 ; Low Byte of String 
jsr $09cc ; Call PRINTSTR 
rts ; Return to BASIC 
PRINTSTR: 
ldy #0x0 ; Reset Y register to O 
LOOP: 
14а ($20),y ; Read char from zero-page 
cpy #$21 ; Past 33 characters? 
beq +$b ; If so, jump to EXTRACR 
cmp #$00 ; Null-terminator? 
beq +$d ; If so, jump to DONE 
jsr $09eb ; Jump to PRINTCHAR 
iny ; Increment Y 
jmp $09ce ; Jump to LOOP 
EXTRACR: 
jsr $09e6 ; Jump to PRINTCR 
jmp $09d4 ; Return to LOOP 
DONE: 
rts ; Return 
PRINTCR: 
lda #13 ; Store CR in A 
jmp $09eb ; Jump to PRINTCHAR 
PRINTCHAR: 
cmp 40x40 ; Greater than 0x40? 
bee $2 ; If so, jump to DONE 
eor 40x20 ; Convert ASCII to PETSCII 
DONE: 
jmp $ffd2 ; CHROUT Routine 


As you can see, it’s pretty similar to any other 
string printing routine in assembly. (For example, 
the one we wrote for the 16-bit real mode portion 
of this polyglot.) Sure, there are a couple of extra 


quirks in there, but nothing too hazardous. Notice 
how we were able to use the Y register to index our 
string in the zero-page. 

'The final part to this Commodore 64 addition is 
how load this thing? I've mentioned that it's vital 
to load this PRG in non-absolute mode so that the 
ELF header can coexist with our BASIC program. 
This is simple, and can be specified when we use 
the LOAD BASIC instruction: LOAD "janus.com",8 
is all it takes. Notice the lack of an extra ,1 which 
is usually seen with the LOAD command. This extra 
argument is used to specify whether we are loading 
in absolute mode or not! Alternatively, if using the 
VICE emulator like I was, the -basicload argument 
does this for us. 


Thinking of taking your 
programming skills beyond BASIC? 
Our 8bitworkshop books will teach you how 
to speak to computers in their native 
languages — 6502 and Z80. 


Use our development tools from the comfort 
of your home. Write source code in C and 
watch as we translate it to machine code, 

then run it on a simulated microcomputer. 


To access, enter into your data terminal: 


8BITWORKSHOP.COM 


59git clone https://github.com/xcellerator/janus 
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Summary 


'Thank you for joining me on this journey, fellow 
computer-enjoyers. This whole process was a wild 
ride of mixed emotions. These 512 bytes took me a 
few months to assemble into their final form. Like 
2020's inaugural Binary Golf Grand Prix, I was con- 
vinced that I wouldn't be able to produce an entry, 
but just kept working on it until something started 
to come together. Like many readers of this fine 
journal, I had read the many prior articles on poly- 
glot techniques, but had yet to attempt one of my 
own. 


If you think that this sounds like fun, then you're 
in luck! The Binary Golf Grand Prix has run now 
for four years and rumours have it that there are 
already plans for 2024. 


ова: 
010: H2 
020: 
436: 
646: 
B5B: 
B6B: 
070: 
488: 
498: 
Вай: 
аһа: 
сӣ: 
йай: 
Вей: 
аға: 
100: 
118: 
120: E T H I N K I NG S T R AN 
138: G E = x c e ] le ra t o г 
140: 
158: 
168: 
178: 
180: 
190: 
1ай: 
iba: 
ich: 
1dB: 
1ей: 
1f@: 


Thanks go to @netspooky for creating and mas- 
terminding this competition. Thanks also to ev- 
eryone who submitted entries last year, as well as 
the Binary Golf Association for comprehending and 
scoring them all. 


So this is my submission in all its glory: an x86 
bootloader, ELF, COM, RAR, ZIP, GNU Multi- 
boot2 Image, and Commodore 64 PRG hybrid. You 
can find this project with a full nasm listing on 
GitHub.’ 


Until next time! 


DiX 
WLA 


x6 


xc 


ex 


xD 


xF 


Ey 


x2 


48 48 (07102608 


02400/5Е200)00800 


с E8 C7 FE ЕВ 84 FE BE 11 70 E8 CE FE 


ҒҒ |09 09785 21/49 10/85 20/20 CC 09/60 


201002211 С040В)/09200)/с0400)20 EB 09 09/0020Е409 


20 E6) 09740704209 


09 0D 4С2ЕВ?09 


->1BC+2 | Ida 
09 0Е/20 D2 1BE+3 | jsr 
1С1+2 | Ida 
1C342 | sta 
1C542 | Ida 
1C742 | sta 
1C943 | jor 
1CC+1 | йз 
Ж) (7) 1CD+2 | | ву 
[оор ($09CE) 
Gb ->1сғ+2 || | ide ($20) 
101+2 сру 
103+2 bas, 
1D542 emp 
1D742 baq 
1D943 jer 
1DC+1 ілу 
100+3 jmo 
ExTRACR 
-»1E043 ш 
1ЕЗ+3 jmp 
Done 
-»1E641 | ris 
->1E7+2 E 
1Е9+3 jmo 
C3140) 90 02 ->1EC+2 emp 
1EE+2 bcc 
1F042 eor 
Done 
Пар 


7 


КЫ, 


(2491) \000 


60 B4 00 CD 16 61 EB B2 55 АА QJ) 1F2+3 | 


C64 BASIC 


LINE 
442 


6+1 
7+8 


F+1 


10+2 
12+2 


14+2 


->16+2 


6502 ASH 


String ($0910) 
->111+33 | 


XO XL x2 ХЗ X4 X9 x6 x? ХВ ХЭ ХА XB XC xD XE XF 


Figure 16: PRG’s side of Janus 
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PaINTCHAR ROUTINE ($09Ев) 


ve Basic (LoapeD АТ $0801) 


@x9E SYS 
" (2491)' 54вв-> 0х18С 


© EnD оғ име 
%2 ->0х16 
15872 


© End oF Line 


Ө END oF PROGRAM 


BEPP.. .or WV NOS 


#@x@e Fuu CHARACTER SET 
Oxffd2 сеч CHROUT 


#>msg ($09)-> 0x0x111 -> ($0910) 


$21 Hich Byte 
#<msg ($10)-» 0x0x111 -> ($0910) 


$20 Low Byte 


$09cc сли PRINTSTR 
RETURN To BASIC 
PRINT STRING ROUTINE ($09CC) 


949 Reset Y 


y READ IN A CHARACTER 
#$21 Arter 33 снАк5 
«b June To ExrRACR-» 0x1E0 
$909 $00-TERMINATED STRING 
44d Jure ro Done-> 0x1E6 
$09eb Jur ro ркімтСнак-> Ox TEC 
INCREMENT Y 

$99e8 Јон ro LOOP 


$09e6 Print a CR-> 0х1ЕТ 
$99d4 Jure sack into Loop-> 0х1СҒ 


#13 Carriace RETURN 
$9990 June То eRINTCHAR-» 0х1ЕС 


*$2 Done 
10b00100000 Convert CHAR 


$ffd2 соч CHROUT 


T D f'!TIT 


Computers are realized 
in the mind of the 
Circuit Design Engineer. 


LAU 


at 
Honeywell 


ELECTRONIC DATA PROCESSING 


His goal is always the same — the indiscernible point of least 


compromise between what is proposed and what is possible. This is 


electronics engineer at its most basic — designing memories, 
power systems, logic circuits for proposed systems, investigating 
experimental innovations and probing application of new knowledge 


of exist 


His respon 
Workin 
ts for wide band feedback amplifiers: or 
reuits that switch milliamperes іп nano- 
ds: or control element circuits that switch amperes in milli- 
ds. 


At times the work requires only his technician's breadboarding of a 


parture from accepted circuit design practices, 


He must keep pace with every pertinent. development — or face 

cal obsolescence. His awareness of this fact is reflected 
in the high ratio of Honeywell Circuit Designers who take full 
advantage of Honeywell's tuition-paid program at many of the world 


renowned universities in the Boston-Cambridge area. 


Qualified individuals, interested in discussing Circuit Design at 

Honeywell, 

ers with experience 
1 Engineering ... 


Positions 
in Logic I 
Microelectron 


Address your resume to: 

Mr. Edwin Barr, Employment Supervisor 
HONEYWELL EDP 

200 Smith 5 Dept. CD05 

Waltham, Massachusetts 


Honeywell 


ELECTRONIC DATA PROCESSING 


Opportunities exist in other Honeywell Divisions, Send resumes to Е. Е. Laing, 
Honeywell, Minneapolis, 55108, An equal opportunity employer, M&F. 


Leedawl 
COM PASS pass for Christmas and let 
him lead “the boys” 


through the woods, over a trail or on a tramp. 
It's the only Guaranteed Jeweled = 
Compass for $1.00. 5: 
If yourdealerdoes not have (Пет, логе usforfolder C-12. & 
Taylor Instrument Companies, Rochester, N. Y. 
Makers of Scientific Instruments of Superiority. 


Minnesota 


Make Your Boy a Leader 


Give him a Leedawl Com- 
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in modular 
component sie 


small m printed circuit or chassis 
mounting m simplifies circuit design 
m no temperature compensation 
needed m low price — unlimited 
variations of voltages, current, tem- 
perature coefficient, regulation, out- 
put impedance, case and mounting. 


Actual size 
“700” Series 


NEW! 


Fully guarded line 
operated 
DC Current Supplies 
Write for Spec. 200-18 


For information—write, wire or phone 


INSTRULAB, INC. 
1205 LAMAR STREET 
DAYTON, OHIO 45404 


Corrections 


Those more familiar with Commodore BASIC than 
I might know that the brackets around the argu- 
ment to the SYS instruction are not required. The 
KERNAL will simply ignore them when parsing the 
line. Perhaps without the minimum size limitation 
brought about by the bootloader, there might be a 
way to save more space in a PRG/ELF hybrid. 

As Janus began to take form, I needed to know 
how many bytes were left that didn't impact my 
tests. I kept setting all the null bytes (excluding 
padding for things like integers) to 58 while making 
sure the functionality was unaffected. This makes 
them stand out nicely in a hexdump so that I could 
find the large unused chunks. However, as pointed 
out by my editors, there was an unintended conse- 
quence! All the way down at offset Oxi9c are four 
bytes of 58 and are labeled as padding to properly 
align the GNU Multiboot 2 image to 64 bits. The 
first two of these bytes are also the length of the com- 
ment of the PKZip file. It pains me that I missed 
the opportunity for some added neatness by setting 
these two bytes back to 00, but the SHA256 hashes 
have me stuck in a bind. 


C64 PRG ((олрер Ат $0801) 
Céh Basic __ 


[ELF] HEADER 


[COH/HBR] Start 


[Г PETS 9*2 jg next -> 0x 
8 Live = GEREN i TE 29 28 32 34 39 IA 29 0000 он | ELMAG 
owes (2 4 9 1 №0 442 || Line 19 541 | ELDATA 
6+1 | | Token Ф9Е 55 
748 || Argument — ' (2491)' Ste oxt8 


© Eno oF une 


| НЫ Ене 
INE 1042 je type 2 ET EXEC 
07400] 5E)00)00)00) 1042 | | NetLine 42 ->0х% c (0200630 o 09 09 1242 |e machine ӨХЗЕ EH_X86_th 


1242 || Line 15872 
8 Eno oF Line 


8 Eno oF procean 


У) 18-8 'ЗИЗОВАА -> OXA 
= 


[MBR] PeintCHar 
ж ИЗМИР) | ->28+2 mwah ӘУЕ Den Сөзасте (TTY Qurvur) 


БЕГІСЕГІНГІЗ ШШШ) = 99999909 5 m oe 
хі х2 х5 S xb x? ха 2E42 int 0X10 GRAPHICS INTERRUPT 


3341 ret 


(wo zeTuen укше) 


? ха үй эй ж хо хЕ хе [COM/MBR] SeTCursor 
р y ->3A+1 pusha 
1) (Ow wie шш 3842 | movah, 2 Ser cursor rosmon 
Q2 ER CD 10/61 3 3042 | movbh, Ө Pace numser 


3F42 | movdh, 2 Row nursee 
4142 | moval, 9 Corn nurae 

4342 Lint @х1@ бөлене інтенкиет (no RETURN уше) 
4541 popa 


pon 
[COH/HBR] мехт 
E8 66 81. ->4743 сай bootloader -> 0x180 


—4MA Ripe 1 
аға pags 5 
528 poffet 0 
5A+8 pvaddr Өхе00000 
БА+8 piles @х2В Siucone + Sra 
7 7248 ponemez @х2В Sreucone + Sre.en J 


--- [COH] PRINTSTRING -- 


[MBR] cLeARScREEN 


6900080000 5 pusha 
03 31 C3 BAVAFALB CORIO) 61 3) ои 
» 


жоға, а] Nurser oF Lives то Scrow (0x00 = Fuu) 


set CS=DS 7642 | movbh, 3 Соли arresure 
->BA+1 | puch — cs 8142 |xorax, ах (CHCL) = сооғоз or Urrex Lerr Corner 
ввн [pop ёс 8343 |movdx, 81847 (DHDL) = oves nar cones 
Weir этне 8642 [int 61D Gearwcs тент (no zeruen VALUE) 
ж 3841 popa 


ВЕ+2 |movah, 9 Мате steme то sroour. 8941 ret 


aces Ер 0x0211 
9142 | in x21 DOSwressurT 


Exir(2) 
x 933 (ғақ ®х4сё2 (С-Ект 02- Rer Var. UTIMS [MBR] PeivTSTRING 
9642 | int 9021 DOS = 7 9841 pusha 
Loo 
ОЗЕРОВ oo озь Lonn ciar (SI) ro AL 
DS 00 EB F3 әдәд | test al, а] Ceckrotwar-erre 
9c+2 | jz send ->0xAb 
9E«3 | call printCham Pant тне снак-> 0x28 
A143 | сай delay Chear awtanion esecr-» 0x11. 
А4+2 Limp sloop > 9 
- END 
г кауу-- [ELF] 648 ЗНЕШСОРЕ__- р {сер 
ха (ИИ ШШ) ->a тома, 1 write Atia ls waitForKeypress -> 0515 


їй йй О 2582 32]0r 05/60, зс БІРЕУДІ: ACH3 том, ax STDOUT 
- AF#S morsel, 04400111 surero ОМИ 
8442 moval, 6x32 steun 
8662 syscall, 
882 moval, 6x3C EXIT 
BAS сй окт? 


вр+2 syscall 


[Rar] Масіс 
© ВЕ+7 Signature | Каг1^2^6\0 EOF BELL 


аг ! 1407 VÀ [Rae] Мам HEADER 
ж nome. 2% ш) ШШШ) C642 CRC32 ӨХӨВСЕ СЕСЗ2елин)8 OFFF 


wx 89393) CB«1 BlockType. 8x73 HEAD MAIN 
wx 893039 а on 


[Rag] Fite Heaver 


44 E377 20/60) ШШ) 33 00 00 00 55 00 0342 CRC Фх924А СВСЗ2(нелькя) & OxFFFF 
00 00 02 BL 9C C4 8A 00/66 28 OC WU] 1E 00/20 E pues 874 а а Е 
08 80 00 2 i zx 
Zr] LocaiFieHeaner / [Rae] Fuenane--)- OFS ыы Dis Шинин. э 


Эр вы BU besa Use mil 

н 

101+4 CRC32 @x2CD1A660 (сонтент=) E241 Hostos 2 HOST_WIN32 

10544 CompSize p E344 CRC32 OXBACASCBl (contents) 


кестеде. E744 Timestamp ӨХС2В6САӨ 1/8/1986 1337 
29212 oe ЕВ+1 Version ед VERSION 2.0 
Au] Strina ($0910) ЕСН Method x38 UNCOHPRESSED 


111433 String BGPP... .or inre. ED+2 FileNamelen xD 


EF+4 Attributes 8x20 ARCHIVE 


Rar] Archive Емо 

бр _ 14442 CRC @хзрс4 сесбеме зоне | LAB 
14641 BlockType 6x78 HEhb.ENDARC 

14742 Finge [o 

14942 BlokSüe 7 


[zie] Centrat Directory 


->14844 Signatue | РК\1\2 
15142 Verslonhesded ЭхА 


15844 CRC32 $xacb1A66o. 
100 0000/30 00 оо 00 б 0017) TEM ES 
MBR] регу -- ! 00 00 00 f 16344 UncompSize @x3@ 
» [6084 06 50 00 57 01 -›:7с+1рвтъ 17514 UMOR GaP. Os 
= 008999, 17A42 | mov ah, 0x86 BIOS War 


17С%2 | moval, 8 Uso -- 
17543 |movex, 1 Seconns 
18143 |movde @ Ншзесоноз 


em [ZIP] Enn or CENTRAL Directory 


= DM) V Vd п зава аш» РК\5\6 
БЕСІГІ 0/22 0000 CUNG) Y Y 19242 EntryCount 1 


18442 | ine ®х15 eror rerev 19444 Size хэр 
18641 popa (Stare Renien w Ai} 19844 OtfsetCD  @K14B -> 0x8 
187+1 ret 


/ GNU] Hutrigoor 2.02 
ж йй 14044 Magic exEB525006 
в 50 52 E8 00 00 00 00 000100 БЕРТ ЕЕН E Architecture Ө 1386 
= === === [COM/HBR] Воотіоярек -- = Eee м euh Enn 

БЕЙ C7 Ft EB 84 FE BE 11 7D EB CE FER ->180%3 call ClearScreen 04 Penn) еі 

_ 38343 call setcursor 5&9 | = 5 [PRG] 6502 ($0366) 
18643 movsi, — msg -> 0x14 r3 ЮЗ И) -сінсь2 са wexee Fu сместе Ser 
18943 call fF) 9) 09) 6.20) 03 10 о 200 СЗ ПЕ ли кем сы жал 
өза 1C142 Ша #omsg (509)-> Ox -> (8090) 
163+2 ба 921 Hon те 
1C542 ida «msg ($10)-» Охи -> (5040) 
1С7+2 sla $28 Lou Brre 
1С9+3 Jar $0966 Cau ғемтбте -> 0x1CD ($09СС) 


шу 0х0 Reser Y 
Loor ($04СЕ) 

@ ->icF+2 |ida($20) y Renom a cuaracrer 

RU 200 21 F0 0 C3 00 F0 БУ CS) 4C. CE 03) 30142 cy $21 Arter 33 cus 


10342 | beq Я$Ы Jur то ExranCa-> x10 S 
105+2 стр ӘӘ 500-теғннлте sena AE LL = 
10742 (Бед +30 Мн ro Done-> Ом 

20953 |ы $092 Jue ro rent Cune-» УЛЕС 

їрс+ї | try Wesen Y 

1003 | imp $09ce Jw ro LOOP 


ExteaCe 
= Z0)E5)03) 4C D4 05 EC Am $0906 Ринтл CR> ЕТ 
1033 | [jmp $38904 Jue sack wro Loor-> Ox 1CF 
Done 
хе @ EMT етин РантСЁ routine ($096) 


Т О EB 09 ED = #13 Canace RETURN 


1993 (jmp $09eb даме то reeirCuas-> ОЕС 
РамтСная routine (50%) 

@ © i62 | cmp #64 

Gum 1EE+2 | boc 342 Done 

1Fø+2 #0000100000 Convert Cine 


ЕТТІ уназ Ch CHROUT 
i = ---- [COM/HBR] wartForKeyPress ~~~ - 4Р9. 
ж DU 60 B4 001056. EB 524 =>1F5#1 nop | бна Brre Panos (NOP) 
aa HG] [сеч мі” HBR SIGNATURE 
+2 | mov. ersrsoke. 1FE42 Signature ФХ55АА 
е Каа i j : ii 
1F8+1 popa x ғ 


1FC+2 jmp bootloader -> 0x180 


Figure 17: All sides of Janus 
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22:12 Let's bide our time for a brand new adventure! 


from the desk of Pastor Manul Laphroaig, 
Tract Association of PoC|| GTFO. 


Dearest neighbor, 


Our scruffy little gang started this самиздат 
journal a few years back because we didn’t much like 
the academic ones, but also because we wanted to 
learn new tricks for reverse engineering. We wanted 
to publish the methods that make exploits and poly- 
glots possible, so that folks could learn from each 
other. Over the years, we’ve been blessed with the 
privilege of editing these tricks, of seeing them early, 
and of seeing them through to print. 

So today, in that spirit of exploration and won- 
der, I pass around the collection plate and ask you, 
neither for paper money nor pocket change, but for 
nifty projects and the clever tricks that make them 
possible. 


Blessed be the hackers who seek to share their 


SOCKETS 


knowledge with the world. Maybe share a techni- JU 

cal story from the good old days. Maybe share a from BARNES... 4 

clever trick from the modern day, such as how to THE MOST ADVANCED LINE — 

improve ТМ5320 support in your favorite disassem- OF SOCKETS FOR SEMICONDUCTOR e a 4 
AND OTHER MINIATURE DEVICES ч 


bler or how to reverse engineer a binary format using 
custom visualizations. 


May you be blessed with the generosity to share 
your runnable source code and buildable hardware 
schematics, so that others may build upon your 
work. May you be blessed with the patience to 
explain how you got to your result, so that others 
may learn from your experience. Teach me to iden- 
tify those things that only look intimidating without 
context, and arm me with the tools to conquer those 
problems. 

Give me these tricks and techniques in an ASCII 
textfile, or UTF-8 if your language insists, includ- 
ing high resolution figures as separate PNG or PDF 
files as an email to pastor@phrack.org. We've 
taken submissions hand drawn on napkins, but we'd 
like to avoid that when possible. My gang and I 
will clean it up, typeset it in TEX, index it and 
print it for the world. We'll happily translate from 
French, Spanish, Portuguese, German, Ukrainian, 
Hungarian, Hebrew, Serbo-Croation, and Southern 
Appalachian. 


Yours in PoC and Pwnage, 
Pastor Мапи Laphroaig, TeGe S.B. 
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* for production, test, aging, and 
breadboarding applications 


* for flat pack and multi-lead TO-5 
packaged l.C.'s, transistors, 
relays and other miniature com- 
ponents 


* for mounting on P.C. boards, 
chassis, breadboards, or use 
with mating base connectors 


* for hard wiring, dip soldering, 
wire wrap, welding, tab or pin 
base mounting, compression fit 
ting, or saddle mounting 


flat pack carriers and contactors, 
flip-top sockets for flat packs, 
module test connectors, and 
breadboard test connectors 


Choose from over 3,000 standard 
sizes, configurations and materials. 
And Barnes can quickly produce 
custom sockets to meet any require- 
ment. Call or write for complete 
technical data. 


barnes 
у DEVELOPMENT CO. 


LANSDOWNE РА 19050 8 (215) MA 2-1525 


